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Preface 


Math is Exciting. We are living in the greatest age of mathematics ever 
seen. In the 1930s, there were some people who feared that the rising 
abstractions of the early twentieth century would either lead to mathe- 
maticians working on sterile, silly intellectual exercises or to mathematics 
splitting into sharply distinct subdisciplines, similar to the way natural 
philosophy split into physics, chemistry, biology and geology. But the very 
opposite has happened. Since World War II, it has become increasingly 
clear that mathematics is one unified discipline. What were separate areas 
now feed off of each other. Learning and creating mathematics is indeed a 
worthwhile way to spend one’s life. 


Math is Hard. Unfortunately, people are just not that good at mathemat- 
ics. While intensely enjoyable, it also requires hard work and self-discipline. 
I know of no serious mathematician who finds math easy. In fact, most, 
after a few beers, will confess as to how stupid and slow they are. This is 
one of the personal hurdles that a beginning graduate student must face, 
namely how to deal with the profundity of mathematics in stark comparison 
to our own shallow understandings of mathematics. This is in part why the 
attrition rate in graduate school is so high. At the best schools, with the 
most successful retention rates, usually only about half of the people who 
start eventually get their PhDs. Even schools that are in the top twenty 
have at times had eighty percent of their incoming graduate students not 
finish. This is in spite of the fact that most beginning graduate students 
are, in comparison to the general population, amazingly good at mathe- 
matics. Most have found that math is one area in which they could shine. 
Suddenly, in graduate school, they are surrounded by people who are just 
as good (and who seem even better). To make matters worse, mathematics 
is a meritocracy. The faculty will not go out of their way to make beginning 
students feel good (this is not the faculty’s job; their job is to discover new 
mathematics). The fact is that there are easier (though, for a mathemati- 
cian, less satisfying) ways to make a living. There is truth in the statement 
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that you must be driven to become a mathematician. 

Mathematics is exciting, though. The frustrations should more than be 
compensated for by the thrills of learning and eventually creating (or dis- 
covering) new mathematics. That is, after all, the main goal for attending 
graduate school, to become a research mathematician. As with all creative 
endeavors, there will be emotional highs and lows. Only jobs that are rou- 
tine and boring will not have these peaks and valleys. Part of the difficulty 
of graduate school is learning how to deal with the low times. 


Goal of Book. The goal of this book is to give people at least a rough idea, 
of the many topics that beginning graduate students at the best graduate 
schools are assumed to know. Since there is unfortunately far more that is 
needed to be known for graduate school and for research than it is possible 
to learn in a mere four years of college, few beginning students know all 
of these topics, but hopefully all will know at least some. Different people 
will know different topics. This strongly suggests the advantage of working 
with others. 

There is another goal. Many nonmathematicians suddenly find that 
they need to know some serious math. The prospect of struggling with a 
text will legitimately seem for them to be daunting. Each chapter of this 
book will provide for these folks a place where they can get a rough idea 
and outline of the topic they are interested in. 

As for general hints for helping sort out some mathematical field, cer- 
tainly one should always, when faced with a new definition, try to find a 
simple example and a simple non-example. A non-example, by the way, 
is an example that almost, but not quite, satisfies the definition. But be- 
yond finding these examples, one should examine the reason why the basic 
definitions were given. This leads to a split into two streams of thought 
for how to do mathematics. One can start with reasonable, if not naive, 
definitions and then prove theorems about these definitions. Frequently the 
statements of the theorems are complicated, with many different cases and 
conditions, and the proofs are quite convoluted, full of special tricks. 

The other, more mid-twentieth century approach, is to spend quite a 
bit of time on the basic definitions, with the goal of having the resulting 
theorems be clearly stated and having straightforward proofs. Under this 
philosophy, any time there is a trick in a proof, it means more work needs 
to be done on the definitions. It also means that the definitions themselves 
take work to understand, even at the level of figuring out why anyone would 
care. But now the theorems can be cleanly stated and proved. 

In this approach the role of examples becomes key. Usually there are 
basic examples whose properties are already known. These examples will 
shape the abstract definitions and theorems. The definitions in fact are 
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made in order for the resulting theorems to give, for the examples, the 
answers we expect. Only then can the theorems be applied to new examples 
and cases whose properties are unknown. 

For example, the correct notion of a derivative and thus of the slope of 
a tangent line is somewhat complicated. But whatever definition is chosen, 
the slope of a horizontal line (and hence the derivative of a constant func- 
tion) must be zero. If the definition of a derivative does not yield that a 
horizontal line has zero slope, it is the definition that must be viewed as 
wrong, not the intuition behind the example. 

For another example, consider the definition of the curvature of a plane 
curve, which is in Chapter Seven. The formulas are somewhat ungainly. 
But whatever the definitions, they must yield that a straight line has zero 
curvature, that at every point of a circle the curvature is the same and 
that the curvature of a circle with small radius must be greater than the 
curvature of a circle with a larger radius (reflecting the fact that it is easier 
to balance on the earth than on a basketball). If a definition of curvature 
does not do this, we would reject the definitions, not the examples. 

Thus it pays to know the key examples. When trying to undo the 
technical maze of a new subject, knowing these examples will not only help 
explain why the theorems and definitions are what they are but will even 
help in predicting what the theorems must be. 

Of course this is vague and ignores the fact that first proofs are almost 
always ugly and full of tricks, with the true insight usually hidden. But in 
learning the basic material, look for the key idea, the key theorem and then 
see how these shape the definitions. 


Caveats for Critics. This book is far from a rigorous treatment of any 
topic. There is a deliberate looseness in style and rigor. J am trying to get 
the point across and to write in the way that most mathematicians talk to 
each other. The level of rigor in this book would be totally inappropriate 
in a research paper. 

Consider that there are three tasks for any intellectual discipline: 


1. Coming up with new ideas. 
2. Verifying new ideas. 
3. Communicating new ideas. 


How people come up with new ideas in mathematics (or in any other field) 
is overall a mystery. There are at best a few heuristics in mathematics, such 
as asking if something is unique or if it is canonical. It is in verifying new 
ideas that mathematicians are supreme. Our standard is that there must 
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be a rigorous proof. Nothing else will do. This is why the mathematical 
literature is so trustworthy (not that mistakes don’t creep in, but they 
are usually not major errors). In fact, I would go as far as to say that if 
any discipline has as its standard of verification rigorous proof, than that 
discipline must be a part of mathematics. Certainly the main goal for a 
math major in the first few years of college is to learn what a rigorous proof 
is. 

Unfortunately, we do a poor job of communicating mathematics. Every 
year there are millions of people who take math courses. A large number 
of people who you meet on the street or on the airplane have taken college 
level mathematics. How many enjoyed it? How many saw no real point 
to it? While this book is not addressed to that random airplane person, 
it is addressed to beginning graduate students, people who already enjoy 
mathematics but who all too frequently get blown out of the mathematical 
water by mathematics presented in an unmotivated, but rigorous, manner. 
There is no problem with being nonrigorous, as long as you know and clearly 
label when you are being nonrigorous. 


Comments on the Bibliography. There are many topics in this book. 
While I would love to be able to say that I thoroughly know the literature 
on each of these topics, that would be a lie. The bibliography has been 
cobbled together from recommendations from colleagues, from books that 
I have taught from and books that I have used. I am confident that there 
are excellent texts that I do not know about. If you have a favorite, please 
let me know at tgarrity@williams.edu. 

While this book was being written, Paulo Ney De Souza and Jorge-Nuno 
Silva wrote Berkeley Problems in Mathematics [26], which is an excellent 
collection of problems that have appeared over the years on qualifying ex- 
ams (usually taken in the first or second year of graduate school) in the 
math department at Berkeley. In many ways, their book is the comple- 
ment of this one, as their work is the place to go to when you want to test 
your computational skills while this book concentrates on underlying intu- 
itions. For example, say you want to learn about complex analysis. You 
should first read chapter nine of this book to get an overview of the basics 
about complex analysis. Then choose a good complex analysis book and 
work most, of its exercises. Then use the problems in De Souza and Silva 
as a final test of your knowledge. 

Finally, the book Mathematics, Form and Function by Mac Lane [82], is 
excellent. It provides an overview of much of mathematics. I am listing it 
here because there was no other place where it could be naturally referenced. 
Second and third year graduate students should seriously consider reading 
this book. 
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On the Structure of 
Mathematics 


If you look at articles in current journals, the range of topics seems immense. 
How could anyone even begin to make sense out of all of these topics? And 
indeed there is a glimmer of truth in this. People cannot effortlessly switch 
from one research field to another. But not all is chaos. There are at least 
two ways of placing some type of structure on all of mathematics. 


Equivalence Problems 


Mathematicians want to know when things are the same, or, when they are 
equivalent. What is meant by the same is what distinguishes one branch 
of mathematics from another. For example, a topologist will consider two 
geometric objects (technically, two topological spaces) to be the same if 
one can be twisted and bent, but not ripped, into the other. Thus for a 
topologist, we have 


0-0-0 


To a differential topologist, two geometric objects are the same if one 
can be smoothly bent and twisted into the other. By smooth we mean that 
no sharp edges can be introduced. Then 
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The four sharp corners of the square are what prevent it from being equiv- 
alent to the circle. 

For a differential geometer, the notion of equivalence is even more re- 
strictive. Here two objects are the same not only if one can be smoothly 
bent and twisted into the other but also if the curvatures agree. Thus for 
the differential geometer, the circle is no longer equivalent to the ellipse: 


er. 


As a first pass to placing structure on mathematics, we can view an area 
of mathematics as consisting of certain Objects, coupled with the notion of 
Equivalence between these objects. We can explain equivalence by looking 
at the allowed Maps, or functions, between the objects. At the beginning of 
most chapters, we will list the Objects and the Maps between the objects 
that are key for that subject. The Equivalence Problem is of course the 
problem of determining when two objects are the same, using the allowable 
maps. 

If the equivalence problem is easy to solve for some class of objects, 
then the corresponding branch of mathematics will no longer be active. 
If the equivalence problem is too hard to solve, with no known ways of 
attacking the problem, then the corresponding branch of mathematics will 
again not be active, though of course for opposite reasons. The hot areas 
of mathematics are precisely those for which there are rich partial but not 
complete answers to the equivalence problem. But what could we mean by 
a partial answer? 

Here enters the notion of invariance. Start with an example. Certainly 
the circle, as a topological space, is different from two circles, 


OO 


since a circle has only one connected component and two circles have two 
connected components. We map each topological space to a positive integer, 
namely the number of connected components of the topological space. Thus 
we have: 

Topological Spaces — Positive Integers. 


The key is that the number of connected components for a space cannot 
change under the notion of topological equivalence (under bendings and 
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twistings). We say that the number of connected components is an invariant 
of a topological space. Thus if the spaces map to different numbers, meaning 
that they have different numbers of connected components, then the two 
spaces cannot be topologically equivalent. 

Of course, two spaces can have the same number of connected compo- 
nents and still be different. For example, both the circle and the sphere 


have only one connected component, but they are different. (These can 
be distinguished by looking at each space’s dimension, which is another 
topological invariant.) The’ goal of topology is to find enough invariants 
to be able to always determine when two spaces are different or the same. 
This has not come close to being done. Much of algebraic topology maps 
each space not to invariant numbers but to other types of algebraic objects, 
such as groups and rings. Similar techniques show up throughout mathe- 
matics. This provides for tremendous interplay between different branches 
of mathematics. 


The Study of Functions 


The mantra that we should all chant each night before bed is: 


Functions describe the World. 


To a large extent what makes mathematics so useful to the world is that 
seemingly disparate real-world situations can be described by the same 
type of function. For example, think of how many different problems can 
be recast as finding the maximum or minimum of a function. 

Different areas of mathematics study different types of functions. Cal- 
culus studies differentiable functions from the real numbers to the real num- 
bers, algebra studies polynomials of degree one and two (in high school) 
and permutations (in college), linear algebra studies linear functions, or 
matrix multiplication. 

Thus in learning a new area of mathematics, you should always “find 
the function” of interest. Hence at the beginning of most chapters we will 
state the type of function that will be studied. 
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Equivalence Problems in Physics 


Physics is an experimental science. Hence any question in physics must 
eventually be answered by performing an experiment. But experiments 
come down to making observations, which usually are described by certain 
computable numbers, such as velocity, mass or charge. Thus the exper- 
iments in physics are described by numbers that are read off in the lab. 
More succinctly, physics is ultimately: 


Numbers in Boxes 


where the boxes are various pieces of lab machinery used to make mea- 
surements. But different boxes (different lab set-ups) can yield different 
numbers, even if the underlying physics is the same. This happens even at 
the trivial level of choice of units. 

More deeply, suppose you are modeling the physical state of a system 
as the solution of a differential equation. To write down the differential 
equation, a coordinate system must be chosen. The allowed changes of co- 
ordinates are determined by the physics. For example, Newtonian physics 
can be distinguished from Special Relativity in that each has different al- 
lowable changes of coordinates. 

Thus while physics is ‘Numbers in Boxes’, the true questions come down 
to when different numbers represent the same physics. But this is an equiv- 
alence problem; mathematics comes to the fore. (This explains in part the 
heavy need for advanced mathematics in physics.) Physicists want to find 
physics invariants. Usually, though, physicists call their invariants ‘Conser- 
vation Laws’. For example, in classical physics the conservation of energy 
can be recast as the statement that the function that represents energy is 
an invariant function. 


Brief Summaries of Topics 


0.1 Linear Algebra 


Linear algebra studies linear transformations and vector spaces, or in an- 
other language, matrix multiplication and the vector space R”. You should 
know how to translate between the language of abstract vector spaces and 
the language of matrices. In particular, given a basis for a vector space, 
you should know how to represent any linear transformation as a matrix. 
Further, given two matrices, you should know how to determine if these ma- 
trices actually represent the same linear transformation, but under different 
choices of bases. The key theorem of linear algebra is a statement that gives 
many equivalent descriptions for when a matrix is invertible. These equiv- 
alences should be known cold. You should also know why eigenvectors and 
eigenvalues occur naturally in linear algebra. 


0.2 Real Analysis 


The basic definitions of a limit, continuity, differentiation and integration 
should be known and understood in terms of e’s and 6’s. Using this € and 6 
language, you should be comfortable with the idea of uniform convergence 
of functions. 


0.3 Differentiating Vector-Valued Functions 


The goal of the Inverse Function Theorem is to show that a differentiable 
function f : R” > R” is locally invertible if and only if the determinant 
of its derivative (the Jacobian) is non-zero. You should be comfortable 
with what it means for a vector-valued function to be differentiable, why 
its derivative must be a linear map (and hence representable as a matrix, 
the Jacobian) and how to compute the Jacobian. Further, you should know 


xxiv BRIEF SUMMARIES OF TOPICS 


the statement of the Implicit Function Theorem and see why is is closely 
related to the Inverse Function Theorem. 


0.4 Point Set Topology 


You should understand how to define a topology in terms of open sets and 
how to express the idea of continuous functions in terms of open sets. The 
standard topology on R” must be well understood, at least to the level of 
the Heine-Borel Theorem. Finally, you should know what a metric space is 
and how a metric can be used to define open sets and hence a topology. 


0.5 Classical Stokes’ Theorems 


You should know about the calculus of vector fields. In particular, you 
should know how to compute, and know the geometric interpretations be- 
hind, the curl and the divergence of a vector field, the gradient of a function 
and the path integral along a curve. Then you should know the classical ex- 
tensions of the Fundamental Theorem of Calculus, namely the Divergence 
Theorem and Stokes’ Theorem. You should especially understand why 
these are indeed generalizations of the Fundamental Theorem of Calculus. 


0.6 Differential Forms and Stokes’ Theorem 


Manifolds are naturally occurring geometric objects. Differential k-forms 
are the tools for doing calculus on manifolds. You should know the various 
ways for defining a manifold, how to define and to think about differential k- 
forms, and how to take the exterior derivative of a k-form. You should also 
be able to translate from the language of k-forms and exterior derivatives 
to the language from Chapter Five on vector fields, gradients, curls and 
divergences. Finally, you should know the statement of Stokes’ Theorem, 
understand why it is a sharp quantitative statement about the equality of 
the integral of a k-form on the boundary of a (k + 1)-dimensional manifold 
with the integral of the exterior derivative of the k-form on the manifold, 
and how this Stokes’ Theorem has as special cases the Divergence Theorem 
and the Stokes’ Theorem from the previous chapter. 


0.7 Curvature for Curves and Surfaces 


Curvature, in all of its manifestations, attempts to measure the rate of 
change of the directions of tangent spaces of geometric objects. You should 
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know how to compute the curvature of a plane curve, the curvature and 
the torsion of a space curve and the two principal curvatures, in terms of 
the Hessian, of a surface in space. 


0.8 Geometry 


Different geometries are built out of different axiomatic systems. Given a 
line l and a point p not on l, Euclidean geometry assumes that there is 
exactly one line containing p parallel to l, hyperbolic geometry assumes 
that there is more than one line containing p parallel to l, and elliptic 
geometries assume that there is no line parallel to l. You should know 
models for hyperbolic geometry, single elliptic geometry and double elliptic 
geometry. Finally, you should understand why the existence of such models 
implies that all of these geometries are mutually consistent. 


0.9 Complex Analysis 


The main point is to recognize and understand the many equivalent ways 
for describing when a function can be analytic. Here we are concerned with 
functions f : U — C, where U is an open set in the complex numbers 
C. You should know that such a function f(z) is said to be analytic if it 
satisfies any of the following equivalent conditions: 
a) For all z% € U, 

im OI) 

z= zo Z— 2 
exists. 


b)The real and imaginary parts of the function f satisfy the Cauchy- 
Riemann equations: 


ORef _ OImf 
Ox — ðy 
and 
ORef __ OImf 
Oy — r` 


c) If y is any counterclockwise simple loop in C=R? and if zp is any complex 
number in the interior of y, then 


(zo) = ga | Lo 





2ri Jẹ Z — žo 


This is the Cauchy Integral formula. 
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d) For any complex number 29, there is an open neighborhood in C = R? 
of z on which 


f(z) = S- an(z — 29)*, 
k=o 


is a uniformly converging series. 

Further, if f : U > C is analytic and if f (zo) 4 0, then at 20, the 
function f is conformal (i.e., angle-preserving), viewed as a map from R? 
to R?. 


0.10 Countability and the Axiom of Choice 


You should know what it means for a set to be countably infinite. In 
particular, you should know that the integers and rationals are countably 
infinite while the real numbers are.uncountably infinite. The statement 
of the Axiom of Choice and the fact that it has many seemingly bizarre 
equivalences should also be known. 


0.11 Algebra 


Groups, the basic object of study in abstract algebra, are the algebraic 
interpretations of geometric symmetries. One should know the basics about 
groups (at least. to the level of the Sylow Theorem, which is a key tool for 
understanding finite groups), rings and fields. You should also know Galois 
Theory, which provides the link between finite groups and the finding of 
the roots of a polynomial and hence shows the connections between high 
school and abstract algebra. Finally, you should know the basics behind 
representation theory, which is how one relates abstract groups to groups 
of matrices. 


0.12 Lebesgue Integration 
You should know the basic ideas behind Lebesgue measure and integration, 


at least to the level of the Lebesgue Dominating Convergence Theorem, 
and the concept of sets of measure zero. 


0.13 Fourier Analysis 


You should know how to find the Fourier series of a periodic function, the 
Fourier integral of a function, the Fourier transform, and how Fourier series 


0.14. DIFFERENTIAL EQUATIONS xxvii 


relate to Hilbert spaces. Further, you should see how Fourier transforms 
can be used to simplify differential equations. 


0.14 Differential Equations 


Much of physics, economics, mathematics and other sciences comes down 
to trying to find solutions to differential equations. One should know that 
the goal in differential equations is to find an unknown function satisfying 
an equation involving derivatives. Subject to mild restrictions, there are 
always solutions to ordinary differential equations. This is most definitely 
not the case for partial differential equations, where even the existence of 
solutions is frequently unknown. You should also be familiar with the three 
traditional classes of partial differential equations: the heat equation, the 
wave equation and the Laplacian. 


0.15 Combinatorics and Probability Theory 


Both elementary combinatorics and basic probability theory reduce to prob- 
lems in counting. You should know that 


(Jn 


is the number of ways of choosing k elements from n elements. The relation 
of (2) to the binomial theorem for polynomials is useful to have handy for 
many computations. Basic probability theory should be understood. In 
particular one should understand the terms: sample space, random vari- 
able (both its intuitions and its definition as a function), expected value 
and variance. One should definitely understand why counting arguments 
are critical for calculating probabilities of finite sample spaces. The link be- 
tween probability and integral calculus can be seen in the various versions 
of the Central Limit Theorem, the ideas of which should be known. 


0.16 Algorithms 


You should understand what is meant by the complexity of an algorithm, at 
least to the level of understanding the question P=NP. Basic graph theory 
should be known; for example, you should see why a tree is a natural struc- 
ture for understanding many algorithms. Numerical Analysis is the study of 
algorithms for approximating the answer to computations in mathematics. 
As an example, you should understand Newton’s method for approximating 
the roots of a polynomial. 


Chapter 1 


Linear Algebra 











Basic Object: Vector Spaces 
Basic Map: Linear Transformations 
Basic Goal: Equivalences for the Invertibility of Matrices 





1.1 Introduction 


Though a bit of an exaggeration, it can be said that a mathematical prob- 
lem can be solved only if it can be reduced to a calculation in linear algebra. 
And a calculation in linear algebra will reduce ultimately to the solving of 
a system of linear equations, which in turn comes down to the manipula- 
tion of matrices. Throughout this text and, more importantly, throughout 
mathematics, linear algebra is a key tool (or more accurately, a collection 
of intertwining tools) that is critical for doing calculations. 

The power of linear algebra lies not only in our ability to manipulate 
matrices in order to solve systems of linear equations. The abstraction of 
these concrete objects to the ideas of vector spaces and linear transforma- 
tions allows us to see the common conceptual links between many seemingly 
disparate subjects. (Of course, this is the advantage of any good abstrac- 
tion.) For example, the study of solutions to linear differential equations 
has, in part, the same feel as trying to model the hood of a car with cubic 
polynomials, since both the space of solutions to a linear differential equa- 
tion and the space of cubic polynomials that model a car hood form vector 
spaces. 

The key theorem of linear algebra, discussed in section six, gives many 
equivalent ways of telling when a system of n linear equations in n unknowns 
has a solution. Each of the equivalent conditions is important. What is 
remarkable and what gives linear algebra its oomph is that they are all the 
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same. 


1.2 The Basic Vector Space R” 


The quintessential vector space is R”, the set of all n-tuples of real numbers 
{(x1,--.,2n) : 2; € R}. 


As we will see in the next section, what makes this a vector space is that 
we can add together two n-tuples to get another n-tuple: 


(@1,+++,%n) + (Y1,+++5Yn) = (zı HY,- -En + Yn) 
and that we can multiply each n-tuple by a real number A: 
A(T1,.. -3 En) = (ÀAT1,.. -3 AZn) 


to get another n-tuple. Of course each n-tuple is usually called a vector 
and the real numbers A are called scalars. When n = 2 and when n = 3 
all of this reduces to the vectors in the plane and in space that most of us 
learned in high school. 

The natural map from some R” to an R” is given by matrix multipli- 
cation. Write a vector x € R” as a column vector: 


Tı 


Tn 


Similarly, we can write a vector in R” as a column vector with m entries. 
Let A be an m x n matrix 


Qil a12 oh ade Ain 
A= : 
Am1 aoe eae Amn 
Then Ax is the m-tuple: 
Gil @i2? ... My Tı Q41%, +...+ Ginn 
Ax={ io: ; i )(?: |= 
Omi «++ «ee Amn Tn Qmit1 +...+Amntn 


For any two vectors x and y in R” and any two scalars À and p, we have 


A(Ax + py) = AAx + pAy. 


1.2. THE BASIC VECTOR SPACE RN 3 


In the next section we will use the linearity of matrix multiplication to 
motivate the definition for a linear transformation between vector spaces. 
Now to relate all of this to the solving of a system of linear equations. 


Suppose we are given numbers }j,..., bm and numbers aj1,...,@mn- Our 
goal is to find n numbers 21,...,2n that solve the following system of linear 
equations: 
Qi17%] +e- ainin = by 
Omii +''+ + amntn = bm- 


Calculations in linear algebra will frequently reduce to solving a system of 
linear equations. When there are only a few equations, we can find the 
solutions by hand, but as the number of equations increases, the calcula- 
tions quickly turn from enjoyable algebraic manipulations into nightmares 
of notation. These nightmarish complications arise not from any single 
theoretical difficulty but instead stem solely from trying to keep track of 
the many individual minor details. In other words, it is a problem in book- 


keeping. 
Write | 
by Qil QI2 «++ Gin 
bm Ami t r. mn 
and our unknowns as 
; Ti 
Bee 
Tn 


Then we can rewrite our system of linear equations in the more visually 
appealing form of 
Ax =b. 


When m > n (when there. are more equations than unknowns), we 
expect there to be, in general, no solutions. For example, when m = 3 
and n = 2, this corresponds geometrically to the fact that three lines in 
a plane will usually have no common point of intersection. When m < n 
(when there are more unknowns than equations), we expect there to be, 
in general, many solutions. In the case when m = 2 and n = 3, this 
corresponds geometrically to the fact that two planes in space will usually 
intersect in an entire line. Much of the machinery of linear algebra deals 
© with the remaining case when m =n. 

Thus we want to find the n x 1 column vector x that solves Ax = b, 
where A is a given n x n matrix and b is a given n Xx 1 column vector. 
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Suppose that the square matrix A has an inverse matrix A~! (which means 
that AT! is also n x n and more importantly that A~!A = J, with I the 
identity matrix). Then our solution will be 


x=A'b 


since 


Ax = A(A7'b) = Ib =b. 


Thus solving our system of linear equations comes down to understanding 
when the n x n matrix A has an inverse. (If an inverse matrix exists, then 
there are algorithms for its calculations.) 

The key theorem of linear algebra, stated in section six, is in essence a 
list of many equivalences for when an n x n matrix has an inverse and is 
thus essential to understanding when a system of linear equations can be 
solved. 


1.3 Vector Spaces and Linear Transformations 


The abstract approach to studying systems of linear equations starts with 
the notion of a vector space. 


Definition 1.3.1 A set V is a vector space over the real numbers! R. if 
there are maps: 


1,.RxV—-V, denoted by a-v or av for all real numbers a and 
elements v in V, 


2. V x V > V, denoted by v+w for all elements v and w in the vector 
space V, 


with the following properties: 
a) There is an element 0, in V such that0+v =v for allv EV. 
b) For each v € V, there is an element (~v) € V with v + (—v) = 0. 
c) For all v,w EV, v +w =w +v. 
d) For alla € R and for all v,w € V, we have that a(v + w) = av + aw. 
e) For alla,b € R and all v € V, a(bv) = (a+ b)v. 
f) For alla,b € R and all v € V, (a + b)v = av + bv. 
g) Forallv E€ V, 1-v =v. 


1The real numbers can be replaced by the complex numbers and in fact by any field 
(which will be defined in Chapter Eleven on algebra). 
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As a matter of notation, and to agree with common usage, the elements of 
a vector space are called vectors and the elements of R (or whatever field 
is being used) scalars. Note that the space R” given in the last section 
certainly satisfies these conditions. 

The natural map between vector spaces is that of a linear transforma- 
tion. 


Definition 1.3.2 A linear transformation T :V — W is a function from 
a vector space V to a vector space W such that for any real numbers a, and 
az and any vectors vı and vz in V, we have 


T (aiv + 202) = aiT (vı) + aT (v2). 


Matrix multiplication from an R” to an R™ gives an example of a linear 
transformation. 


Definition 1.3.3 A subset U of a vector space V is a subspace of V if U 
is itself a vector space. 


In practice, it is usually easy to see if a subset of a vector space is in fact 
a subspace, by the following proposition, whose proof is left to the reader: 


Proposition 1.3.1 A subset U of a vector space V is a subspace of V if 
U is closed under addition and scalar multiplication. 


Given a linear transformation T : V —> W, there are naturally occurring 
subspaces of both V and W. 


Definition 1.3.4 If T :V — W is a linear transformation, then the kernel 
of T is: 
ker(T) = {v €V: T(v) =0} 


and the image of T is 
Im(T) = {w E€ W : there exists a v E Vwith T(v) = w}. 


The kernel is a subspace of V, since if v; and vg are two vectors in the 
kernel and if a and b are any two real numbers, then 


T (av, + bv2) aT (v1) + bT (v2) 
a:0+6-0 


0. 


In a similar way we can show that the image of T is a subspace of W. 
If the only vector spaces that ever occurred were column vectors in R”, 
then even this mild level of abstraction would be silly. This is not the case. 
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Here we look at only one example. Let C*[0, 1] be the set of all real-valued 
functions with domain the unit interval (0, 1]: 


f: [0,1] >R 


such that the kth derivative of f exists and is continuous. Since the sum of 
any two such functions and a multiple of any such function by a scalar will 
still be in C*[0, 1], we have a vector space. Though we will officially define 
dimension next section, C*[0, 1] will be infinite dimensional (and thus defi- 
nitely not some R”). We can view the derivative as a linear transformation 
from C*[0, 1] to those functions with one less derivative, C*—1[0, 1]: 


d 
i C*(0,1]) 3 C*-1[0, 1]. 
The kernel of a consists of those functions with af = 0, namely constant 
functions. 
Now consider the differential equation 


d'f -df 
da? + 3 +2f=0 
Let T be the linear transformation: 
T= A ogiya 1] = C°[0,1] 
dz? dz : ? Ura 


The problem of finding a solution f(x) to the original differential equation 
can now be translated to finding an element of the kernel of T. This suggests 
the possibility (which indeed is true) that the language of linear algebra can 
be used to understand solutions to (linear) differential equations. 


1.4 Bases, Dimension, and Linear Transfor- 
mations as Matrices 


Our next goal is to define the dimension of a vector space. 


Definition 1.4.1 A set of vectors (v1,...,Un) form a basis for the vector 
space V if given any vector v in V, there are unique scalars a,,...,an€ R 
with v = a1, +... + GnUn. 


Definition 1.4.2 The dimension of a vector space V, denoted by dim(V), 
ts the number of elements in a basis. 
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As it is far from obvious that the number of elements in a basis will 
always be the same, no matter which basis is chosen, in order to make 
the definition of the dimension of a vector space well-defined we need the 
following theorem (which we will not prove): 


Theorem 1.4.1 All bases of a vector space V have the same number of 
elements. 


For R”, the usual basis is 
{(1,0, ...,0), (0, 1,0, -..,0), 02 (0, res 0: 


Thus R” is n dimensional. Of course if this were not true, the above def- 
inition of dimension would be wrong and we would need another. This is 
an example of the principle mentioned in the introduction. We have a good 
intuitive understanding of what dimension should mean for certain specific 
examples: a line needs to be one dimensional, a plane two dimensional and 
space three dimensional. We then come up with a sharp definition. If this 
definition gives the “correct” answer for our three already understood ex- 
amples, we are somewhat confident that the definition has indeed captured 
what is meant by, in this case, dimension. Then we can apply the definition 
to examples where our intuitions fail. 
Linked to the idea of a basis is: 


Definition 1.4.3 Vectors (v1,...,Un) in a vector space V are linearly in- 
dependent if whenever 


aivi H't + ann = Ù, 
it must be the case that the scalars a,,...,@n must all be zero. 


Intuitively, a collection of vectors are linearly independent if they all point 
in different directions. A basis consists then in a collection of linearly 
independent vectors that span the vector space, where by span we mean: 


Definition 1.4.4 A set of vectors (v1,...,Un) span the vector space V if 
given any vector v in V, there are scalars ay,...,@,€ R with v = avı + 
-> + antn- 


Our goal now is to show how all linear transformations T : V => W 
between finite-dimensional spaces can be represented as matrix multiplica- 
tion, provided we fix bases for the vector spaces V and W. 

First fix a basis {v1, -.., Un } for V and a basis {w1,..., Wm} for W. Before 
looking at the linear transformation T, we need to show how each element 
of the n-dimensional space V can be represented as a column vector in R” 
and how each element of the m-dimensional space W can be represented 
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as a column vector of R™. Given any vector v in V, by the definition of 
basis, there are unique real numbers q@),...,@, with 


V= QV, +++ + GnUn- 
We thus represent the vector v with the column vector: 


a 


Qn 


Similarly, for any vector w in W, there are unique real numbers }j,..., bm 
with 


w = bw +e tH bmWm. 
Here we represent w as the column vector 
by 


bn 

Note that we have established a correspondence between vectors in V and 
W and column vectors R” and R”, respectively. More technically, we can 
show that V is isomorphic to R” (meaning that there is a one-one, onto 
linear transformation from V to R”) and that W is isomorphic to R”, 
though it must be emphasized that the actual correspondence only exists 
after a basis has been chosen (which means that while the isomorphism 
exists, it is not canonical; this is actually a big deal, as in practice it is 
unfortunately often the case that no basis is given to us). 

We now want to represent a linear transformation T : V — W as an 
m xn matrix A. For each basis vector v; in the vector space V, T(v;) will 
be a vector in W. Thus there will exist real numbers a1;,...,@mi such that 


T (vj) = ajiw +e + amiWm. 


We want to see that the linear transformation T will correspond to the 
m Xn matrix 


Q11 Q12 kis Qin 
A= 
Aml s... eee) mn 
Given any vector v in V, with v = avı ++ + nUn, we have 
T (v) = T (aw qerre AnUn) 


= aT (v;) tees anT (vn) 
= alaw +e + amWm) +: 


+an (ainw Fep amnWm). 
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But under the correspondences of the vector spaces with the various column 
spaces, this can be seen to correspond to the matrix multiplication of A 
times the column vector corresponding to the vector v: 


Qil Q12 ».. Qin ai by 


Gm aa a a Qn bm 


Note that if T : V — V is a linear transformation from a vector space to 
itself, then the corresponding matrix will be n x n, a square matrix. 
Given different bases for the vector spaces V and W, the matrix asso- 
ciated to the linear transformation T will change. A natural problem is to 
determine when two matrices actually represent the same linear transfor- 
mation, but under different bases. This will be the goal of section seven. 


1.5 The Determinant 


Our next task is to give a definition for the determinant of a matrix. In fact, 
we will give three alternative descriptions of the determinant. All three are 
equivalent; each has its own advantages. 

Our first method is to define the determinant of a 1 x 1 matrix and then 
to define recursively the determinant of an n x n matrix. 

Since 1 x 1 matrices are just numbers, the following should not at all 
be surprising: 


Definition 1.5.1 The determinant of a 1 x 1 matriz (a) is the real-valued 
function 
det(a) = a. 


This should not yet seem significant. 
Before giving the definition of the determinant for a general n xn matrix, 
we need a little notation. For an n x n matrix 


Qil Q12 «++ Gin 
ae oe 
ünl -~ «+s Ann 
denote by Ajj the (n — 1) x (n — 1) matrix obtained from A by deleting 


the ith row and the jth column. For example, if A = a a then 


2 3 5 6 9 
Aı2 = (a21). Similarly if A = € 4 9], then Ajo = G 4 i 
7 1 8 
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Since we have a definition for the determinant for 1 x 1 matrices, we 
will now assume by induction that we know the determinant of any (n — 
1) x (n—1) matrix and use this to find the determinant of an n x n matrix. 


Definition 1.5.2 Let A be ann xn matriz. Then the determinant of A is 


n 


det(A) = X (—1)"+* a1; det( A14). 
k=1 


a a 
Thus for A= [ “1! “12 ]} we have 
Q21 Q22 


det(A) = 11 det(A11) — 412 det(412) = 4110822 — 4120821, 


which is what most of us think of as the determinant. The determinant of 
our above 3 x 3 matrix is: 


235 
det | 6 4 9 la oeer(* ?) maa E Ndal! AN 
eee 1 8 7 8 7 1 


While this definition is indeed an efficient means to describe the determi- 
nant, it obscures most of the determinant’s uses and intuitions. 

The second way we can describe the determinant has built into it the 
key algebraic properties of the determinant. It highlights function-theoretic 
properties of the determinant. 

Denote the n x n matrix A as A = (Aj,...,An), where A; denotes the 
it? column: 


ani 


Definition 1.5.3 The determinant of A is defined as the unique real-valued 
function 
det : Matrices > R 


satisfying: 
a) det(Ay,..., AAR, «1, An) = Adet(Aj,..., Ag). 
b) det(Ay,..., An + AAj,...,; An) = det(Aj,..., An) fork #1. 
c) det (Identity matriz) = 1. 


Thus, treating each column vector of a matrix as a vector in R”, the de- 
terminant can be viewed as a special type of function from R” x... x R” 
to the real numbers. 
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In order to be able to use this definition, we would have to prove that 
such a function on the space of matrices, satisfying conditions a through c, 
even exists and then that it is unique. Existence can be shown by checking 
that our first (inductive) definition for the determinant satisfies these con- 
ditions, though it is a painful calculation. The proof of uniqueness can be 
found in almost any linear algebra text. 

The third definition for the determinant is the most geometric but is 
also the most vague. We must think of an n x n matrix A as a linear 
transformation from R” to R”. Then A will map the unit cube in R” to 
some different object (a parallelepiped). The unit cube has, by definition, 
a volume of one. 


Definition 1.5.4 The determinant of the matriz A is the signed volume 
of the image of the unit cube. 


This is not well-defined, as the very method of defining the volume of the 
image has not been described. In fact, most would define the signed volume 
of the image to be the number given by the determinant using one of the 
two earlier definitions. But this can be all made rigorous, though at the 
price of losing much of the geometric insight. 


Let’s look at some examples: the matrix A = p 3 takes the unit 


0 1 
square to 


Since the area is doubled, we must have 
det(A) = 2. 


Signed volume means that if the orientations of the edges of the unit 
cube are changed, then we must have a negative sign in front of the volume. 
For example, consider the matrix A = T : . Here the image is 
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Note that the orientations of the sides are flipped. Since the area is still 
doubled, the definition will force 


det(A) = —2. 


To rigorously define orientation is somewhat tricky (we do it in Chapter 
Six), but its meaning is straightforward. 
The determinant has many algebraic properties. For example, 


Lemma 1.5.1 : If A and B are n x n matrices, then 
det(AB) = det(A) det(B). 


This can be proven by either a long calculation or by concentrating on the 
definition of the determinant as the change of volume of a unit cube. 


1.6 The Key Theorem of Linear Algebra 


Here is the the key theorem of linear algebra. (Note: we have yet to define 
eigenvalues and eigenvectors, but we will in section eight.) 


Theorem 1.6.1 (Key Theorem) Let A be ann xn matrix. Then the 
following are equivalent: 


1. A is invertible. 
2. det(A) #0. 
3. ker( A) =0. 


4. Ifb is a column vector in R”, there is a unique column vector x 
in R” satisfying Ax = b. 
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5. The columns of A are linearly independent n x 1 column vectors. 
6. The rows of A are linearly independent 1 x n row vectors. 


7. The transpose A‘ of A is invertible. (Here, if A = (ai), then 
A’ = (a;;)). 
8. All of the eigenvalues of A are nonzero. 
We can restate this theorem in terms of linear transformations. 


Theorem 1.6.2 (Key Theorem) Let T :V > V bea linear transforma- 
tion. Then the following are equivalent: 


1. T is invertible. 


2. det(T) 4 0, where the determinant is defined by a choice of basis 
onV. 


3. ker(T) =0. 


4. If b is a vector in V, there is a unique vector v in V satisfying 
T(v) =0. 


5. For any basis v1,...,Un of V, the image vectors T(v,),...,T (vn) 
are linearly independent. 


6. For any basis v1,...,Un of V, if S denotes the transpose linear 
transformation of T, then the image vectors S(v,),...,S(vn) are 
linearly independent. 


7. The transpose of T is invertible. (Here the transpose is defined by a 
choice of basis on V). 


8. All of the eigenvalues of T are nonzero. 


In order to make the correspondence between the two theorems clear, we 
must worry about the fact that we only have definitions of the determinant 
and the transpose for matrices, not for linear transformations. While we 
do not show it, both notions can be extended to linear transformations, 
provided a basis is chosen (in fact, provided we choose an inner product, 
which will be defined in Chapter Thirteen on Fourier series). But note that 
while the actual value det(T) will depend on a fixed basis, the condition 
that det(T) 4 0 does not. Similar statements hold for conditions (6) and 
(7). A proof is the goal of exercise 7, where you are asked to find any linear 
algebra book and then fill in the proof. It is unlikely that the linear algebra 
book will have this result as it is stated here. The act of translating is in 
fact part of the purpose of making this an exercise. 

Each of the equivalences is important. Each can be studied on its own 
merits. It is remarkable that they are the same. 
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1.7 Similar Matrices 


Recall that given a basis for an n dimensional vector space V, we can 
represent a linear transformation 


T:V>V 


asan nxn matrix A. Unfortunately, if you choose a different basis for V, the 
matrix representing the linear transformation T will be quite different from 
the original matrix A. This section’s goal is to find out a clean criterion for 
when two matrices actually represent the same linear transformation but 
under different choice of bases. 


Definition 1.7.1 Two n x n matrices A and B are similar if there is an 
invertible matriz C such that 


A= CBC. 


We want to see that two matrices are similar precisely when they repre- 
sent the same linear transformation. Choose two bases for the vector space 
V, say {v1,---,Un} (the v basis) and {w,...,wn} (the w basis). Let A be 
the matrix representing the linear transformation T for the v basis and let 
B be the matrix representing the linear transformation for the w basis. We 
want to construct the matrix C so that A = C7} BC. 

Recall that given the v basis, we can write each vector z € V asannx1 
column vector as follows: we know that there are unique scalars @),...,@n 
with 

Z = QVI Hte + Gnvn- 


We then write z, with respect to the v basis, as the column vector: 


ai 


an 
Similarly, there are unique scalars b;,..., bp so that 
z= byw, +++ + dnwn, 
meaning that with respect to the w basis, the vector z is the a vector: 
bı 


bn 
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The desired matrix C will be the matrix such that 
a bi 
Cy: =|: 
an bn 
If C = (ci), then the entries c;; are precisely the numbers which yield: 
Wi = Cit F... F CinUn- 


Then, for A and B to represent the same linear transformation, we need 
the diagram: 


Re R 
C 4 LC 
> 
R” B R” 
to commute, meaning that CA = BC or 
A=C'BC, 


as desired. 

Determining when two matrices are similar is a type of result that shows 
up throughout math and physics. Regularly you must choose some coordi- 
nate system (some basis) in order to write down anything at all, but the 
underlying math or physics that you are interested in is independent of the 
initial choice. The key question becomes: what is preserved when the coor- 
dinate system is changed? Similar matrices allow us to start to understand 
these questions. 


1.8 Eigenvalues and Eigenvectors 


In the last section we saw that two matrices represent the same linear trans- 
formation, under different choices of bases, precisely when they are similar. 
This does not tell us, though, how to choose a basis for a vector space so 
that a linear transformation has a particularly decent matrix representa- 
tion. For example, the diagonal matrix 


100 
A=j;0 2 0 
0 0 3 


is similar to the matrix 
1 -4 —5 
B=-{1 8 -1], 
5 4 15 
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but all recognize the simplicity of A as compared to B. (By the way, it is 
not obvious that A and B are similar; I started with A, chose a nonsingular 
matrix C and then used the software package Mathematica to compute 
C-1AC to get B. I did not just suddenly “see” that A and B are similar. 
No, I rigged it to be so.) 

One of the purposes behind the following definitions for eigenvalues 
and eigenvectors is to give us tools for picking out good bases. There are, 
though, many other reasons to understand eigenvalues and eigenvectors. 


Definition 1.8.1 Let T : V > V be a linear transformation. Then a 
nonzero vector v E€ V will be an eigenvector of T with eigenvalue A, a 
scalar, if 

T(v) = Av. 


For an n x n matrix A, a nonzero column vector x E€ R” will be an eigen- 
vector with eigenvalue A, a scalar, if 


Ax = Ax. 


Geometrically, a vector v is an eigenvector of the linear transformation T 
with eigenvalue A if T stretches v by a factor of À. 


For example, 
—2 —2 1 1 
a Kaa 


and thus 2 is an eigenvalue and (2) an eigenvector for the linear trans- 


6 5 
Luckily there is an easy way to describe the eigenvalues of a square 
matrix, which will allow us to see that the eigenvalues of a matrix are 
preserved under a similarity transformation. 


formation represented by the 2 x 2 matrix & a) 


Proposition 1.8.1 A number X will be an eigenvalue of a square matrix 
A if and only if X is a root of the polynomial 


P(t) = det(tI — A). 


The polynomial P(t) = det(tI — A) is called the characteristic polynomial 
of the matrix A. 
Proof: Suppose that is an eigenvalue of A, with eigenvector v. Then 
Av = Xv, or 

Av — Av =0, 
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where the zero on the right hand side is the zero column vector. Then, 
putting in the identity matrix J, we have 


0 = dv — Av = (AI — A)v. 


Thus the matrix AJ — A has a nontrivial kernel, v. By the key theorem of 
linear algebra, this happens precisely when 


det(\I — A) = 0, 


which means that À is a root of the characteristic polynomial P(t) = 
det(tl — A). Since all of these directions can be reversed, we have our 
theorem. O 


Theorem 1.8.1 Let A and B be similar matrices. Then the characteristic 
polynomial of A is equal to the characteristic polynomial of B. 


Proof: For A and B to be similar, there must be an invertible matrix C 
with A = C7? BC. Then 


det(tI — A) = det(t! —C71BC) 
= det(tC-?C — C7 BC) 
= det(C7})det(tI — B) det(C) 


= det(tI — B) 


using that 1 = det(C~C) = det(C—!) det(C). O 
Since the characteristic polynomials for similar matrices are the same, 
this means that the eigenvalues must be the same. 


Corollary 1.8.1.1 The eigenvalues for similar matrices are equal. 


Thus to see if two matrices are similar, one can compute to see if the 
eigenvalues are equal. If they are not, the matrices are not similar. Unfor- 
tunately in general, having equal eigenvalues does not force matrices to be 
similar. For example, the matrices 


4=(5 7) 
B=(5 2) 


both have eigenvalues 1 and 2, but they are not similar. (This can be shown 
by assuming that there is an invertible two-by-two matrix C with C~!AC = 
B and then showing that det(C) = 0, contradicting C’s invertibility.) 
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Since the characteristic polynomial P(t) does not change under a simi- 
larity transformation, the coefficients of P(t) will also not change under a 
similarity transformation. But since the coefficients of P(t) will themselves 
be (complicated) polynomials of the entries of the matrix A, we now have 
certain special polynomials of the entries of A that are invariant under a 
similarity transformation. One of these coefficients we have already seen 
in another guise, namely the determinant of A, as the following theorem 
shows. This theorem will more importantly link the eigenvalues of A to the 
determinant of A. 


Theorem 1.8.2 Let \1,...,An be the eigenvalues, counted with multiplic- 
ity, of a matriz A. Then 


det(A) = M1 + An. 


Before proving this theorem, we need to discuss the idea of counting 
eigenvalues “with multiplicity”. The difficulty is that a polynomial can have 
a root that must be counted more than once (e.g., the polynomial (z ~ 2)? 
has the single root 2 which we want to count twice). This can happen 
in particular to the characteristic polynomial. For example, consider the 
matrix 


500 
05 0 
00 4 


which has as its characteristic polynomial the cubic 
(t —5)(£-—5)(t — 4). 


For the above theorem, we would list the eigenvalues as 4, 5, and 5, hence 
counting the eigenvalue 5 twice. 

Proof: Since the eigenvalues \;,...,A, are the (complex) roots of the 
characteristic polynomial det(tJ — A), we have 


(t — `) (t — àn) = det (tI — A). 
Setting t = 0, we have 
(—1)"A1 -+- An = det(—A). 


In the matrix (—A), each column of A is multiplied by (—1). Using the 
second definition of a determinant, we can factor out each of these (—1)s, 
to get 

(-1)"A1--- An = (—1)” det(A) 
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and our result. O 

Now finally to turn back to determining a “good” basis for representing 
a linear transformation. The measure of “goodness” is how close the matrix 
is to being a diagonal matrix. We will restrict ourselves to a special, but 
quite prevalent, class: symmetric matrices. By symmetric, we mean that 
if A = (a;;), then we require that the entry at the ith row and jth column 
(aij) must equal to the entry at the jth row and the ith column (a;;). Thus 


5 3 4 

3 5 2 

4 2 4 
is symmetric but 

5 2 3 

6 5 3 

2 18 4 


is not. 


Theorem 1.8.3 If A is a symmetric matriz, then there is a matriz B sim- 
ilar to A which is not only diagonal but with the entries along the diagonal 
being precisely the eigenvalues of A. 


Proof: The proof basically rests on showing that the eigenvectors for A 
form a basis in which A becomes our desired diagonal matrix. We will 
assume that the eigenvalues for A are distinct, as technical difficulties occur 
when there are eigenvalues with multiplicity. 


Let v1, V2,.-.,;Vn be the eigenvectors for the matrix Á, with correspond- 
ing eigenvalues A,,A2,.-.,An- Form the matrix 
C = (v1, V2,---,Vn)s 


where the ith column of C is the column vector v;. We will show that 
the matrix C~!AC will satisfy our theorem. Thus we want to show that 
CAC equals the diagonal matrix 


1 0 +. 0 
B=[i ii: 
0 0 = Àn 
Denote 
1 0 0 
0 1 0 
e; = »€2 = ) €n = 
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Then the above diagonal matrix B is the unique matrix with Be; = i;e;, 
for all ¿. Our choice for the matrix C now becomes clear as we observe that 
for all i, Ce; = v;. Then we have 


C1 ACe; = CT! Av; = Ct (Aiv:) = A; Ctv; = \;ê;, 


giving us the theorem. O 

This is of course not the end of the story. For nonsymmetric matrices, 
there are other canonical ways finding “good” similar matrices, such as the 
Jordan canonical form, the upper triangular form and rational canonical 
form. 


1.9 Dual Vector Spaces 


It pays to study functions. In fact, functions appear at times to be more 
basic than their domains. In the context of linear algebra, the natural class 
of functions is linear transformations, or linear maps from one vector space 
to another. Among all real vector spaces, there is one that seems simplest, 
namely the one-dimensional vector space of the real numbers R. This leads 
us to examine a special type of linear transformation on a vector space, 
those that map the vector space to the real numbers, the set of which we 
will call the dual space. Dual spaces regularly show up in mathematics. 
Let V be a vector space. The dual vector space, or dual space, is: 


y* 


{linear maps from V to the real numbers R} 
{v* :V — R | v* is linear}. 


You can check that the dual space V* is itself a vector space. 
Let T : V — W be a linear transformation. Then we can define a 
natural linear transformation 


T*:W* => V* 


from the dual of W to the dual of V as follows. Let w* € W*. Then 
given any vector w in the vector space W, we know that w*(w) will be a 
real number. We need to define T* so that T*(w*) € V*. Thus given any 
vector v € V, we need T*(w*)(v) to be a real number. Simply define 


T*(w")(v) = w* (T(v)). 


By the way, note that the direction of the linear transformation T : 
V — W is indeed reversed to T* : W* -+ V*. Also by “natural”, we do 
not mean that the map 7™ is “obvious” but instead that it can be uniquely 
associated to the original linear transformation T. 
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Such a dual map shows up in many different contexts. For example, if 
X and Y are topological spaces with a continuous map F : X => Y and if 
C(X) and C(Y) denote the sets of continuous real-valued functions on X 
and Y, then here the dual map 


F*:C(Y) 3 C(X) 


is defined by F'*(g)(x) = g(F(x)), where g is a continuous map on Y. 

Attempts to abstractly characterize all such dual maps were a major 
theme of mid-twentieth century mathematics and can be viewed as one of 
the beginnings of category theory. 


1.10 Books 


Mathematicians have been using linear algebra since they have been doing 
mathematics, but the styles, methods and the terminologies have shifted. 
For example, if you look in a college course catalogue in 1900 or proba- 
bly even 1950, there will be no undergraduate course called linear algebra. 
Instead there were courses such as “Theory of Equations” or simply “Alge- 
bra”. As seen in one of the more popular textbooks in the first part of the 
twentieth century, Maxime Bocher’s Introduction to Higher Algebra [10], the 
concern was on concretely solving systems of linear equations. The results 
were written in an algorithmic style. Modern day computer programmers 
usually find this style of text far easier to understand than current math 
books. In the 1930s, a fundamental change in the way algebraic topics 
were taught occurred with the publication of Van der Waerden’s Modern 
Algebra (113][114], which was based on lectures of Emmy Noether and Emil 
Artin. Here a more abstract approach was taken. The first true modern 
day linear algebra text was Halmos’ Finite-dimensional Vector Spaces (52). 
Here the emphasis is on the idea of a vector space from the very beginning. 
Today there are many beginning texts. Some start with systems of linear 
equations and then deal with vector spaces, others reverse the process. A 
long time favorite of many is Strang’s Linear Algebra and Its Applications 
[109]. As a graduate student, you should volunteer to teach or TA linear 
algebra as soon as possible. 


1.11 Exercises 


1. Let L: V — W bea linear transformation between two vector spaces. 
Show that 
dim(ker(Z)) + dim(Jm(L)) = dim(V). 
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2. Consider the set of all polynomials in one variable with real coefficients 
of degree less than or equal to three. 

a. Show that this set forms a vector space of dimension four. 

b. Find a basis for this vector space. 

c. Show that differentiating a polynomial is a linear transformation. 

d. Given the basis chosen in part (b), write down the matrix represen- 
tative of the derivative. 
3. Let A and B be two n x n invertible matrices. Prove that 


(AB)! = BOA, 


aa) 


Find a matrix C so that C7! AC is a diagonal matrix. 
5. Denote the vector space of all functions 


4, Let 


f:ROR 


which are infinitely differentiable by C°(R). This space is called the space 
of smooth functions. 

a. Show that C°(R) is infinite dimensional. 

b. Show that differentiation is a linear transformation: 


d CO co 
ap 1 OR) = OAR): 


c. For a real number À, find an eigenvector for £ with eigenvalue À. 


6. Let V be a finite dimensional vector space. Show that the dual vector 
space V* has the same dimension as V. 

7. Find a linear algebra text. Use it to prove the key theorem of linear 
algebra. Note that this is a long exercise but is to be taken seriously. 


Chapter 2 


c and ó Real Analysis 


Basic Object: The Real Numbers 


Basic Maps: Continuous and Differentiable Functions 
Basic Goal: The Fundamental Theorem of Calculus 





While the basic intuitions behind differentiation and integration were known 
by the late 1600s, allowing for a wealth of physical and mathematical appli- 
cations to develop during the 1700s, it was only in the 1800s that sharp, rig- 
orous definitions were finally given. The key concept is that of a limit, from 
which follow the definitions for differentiation and integration and rigorous 
proofs of their basic properties. Far from a mere exercise in pedantry, this 
rigorization actually allowed mathematicians to discover new phenomena. 
For example, Karl Weierstrass discovered a function that was continuous 
everywhere but differentiable nowhere. In other words, there is a function 
with no breaks but with sharp edges at every point. Key to his proof is the 
need for limits to be applied to sequences of functions, leading to the idea 
of uniform convergence. 

We will define limits and then use this definition to develop the ideas 
of continuity, differentiation and integration of functions. Then we will 
show how differentiation and integration are intimately connected in the 
Fundamental Theorem of Calculus. Finally we will finish with uniform 
convergence of functions and Weierstrass’ example. 


2.1 Limits 


Definition 2.1.1 A function f : R —> R. has a limit L at the point a if 
given any real number e > Q there is a real number 6 > 0 such that for all 
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real numbers x with 
0< |r -al <6, 


we’ have 
f(z) —L| <.e. 
This is denoted by 
lim f(x) = L. 
ta 


Intuitively, the function f(x) should have a limit L at a point a if, for 
numbers x near a, the value of the function f(x) is close to the number L. 
In other words, to guarantee that f(x) be close to L, we can require that 
x is close to a. Thus if we want f(x) to be within an arbitrary e > 0 of 
the number L (i.e., if we want | f(x) — L| < €), we must be able to specify 
how close to a we must force x to be. Therefore, given a number e > 0 (no 
matter how small), we must be able to find a number 6 > 0 so that if z is 
within ô of a, we have that f(x) is within an e of L. This is precisely what 
the definition says, in symbols. 

For example, if the above definition of a limit is to make sense, it must 
yield that 


lim z? = 4. 
x2 


We will check this now. It must be emphasized that we would be foolish 
to show that z? approaches 4 as x approaches 2 by actually using the 
definition. We are again doing the common trick of using an example whose 
answer we already know to check the reasonableness of a new definition. 
Thus for any € > 0, we must find a ô > 0 so that if 0 < |x — 2| < ô, we will 
have 
|z? — 4| < €. 

Set z 

ô = min (= 1). 


As often happens, the initial work in finding the correct expression for 6 is 
hidden. Also, the ‘5’ in the denominator will be seen not to be critical. Let 
0 < |x — 2| < 6. We want |x? — 4| < e. Now 


Ja? — 4| = [æ — 2] - æ + 2l. 
Since x is within ô of 2, 
lv +2|< (2+6)+2=44+6<5. 


Thus g 
lz? -4| = |z -2| -|z +2| <5- -2| <5-5 =€ 


We are done. 
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2.2 Continuity 


Definition 2.2.1 A function f :R— R is continuous at a if 


lim f(z) = f(a). 
Of course, any intuition about continuous functions should capture the 
notion that a continuous function cannot have any breaks in its graph. In 
other words, you can graph a continuous function without having to lift 
your pencil from the page. (As with any sweeping intuition, this one will 
break down if pushed too hard.) 


continuous not continuous 





In € and ô notation, the definition of continuity is: 


Definition 2.2.2 A function f : R — R is continuous at a if given any 
e > 0, there is some 6 > 0 such that for all x with 0 < |x — a| < ô, we have 


[f(z) — fla) < €. 
For an example, we will write down a function that is clearly not continuous 
at the origin 0, and use this function to check the reasonableness of the 


definition. 
Let 


f(z) = 


Note that the graph of f(x) has a break in it at the origin. 


We want to capture this break by showing that 


1 ife>0 
—1 ifa<0 


lim f(a) # £00). 
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Now f(0) = —1. Let ¢ = 1 and let 6 > 0 be any positive number. Then for 
any z with 0 < z < ô, we have f(z) = 1. Then 


[f(x) — f(0)| =|1-(-)] =2>1=€. 
Thus for all positive z < ô. 
[f(z) — F(0)| > €. 
Hence, for any 6 > 0, there are x with 
lz -O0| <6 
but 


|f(z) - F(0)| >«. 


This function is indeed not continuous. 


2.3 Differentiation 
Definition 2.3.1 A function f:R—- R is differentiable at a if 
tim £60) -£@) 
@Za rY—a 
exists. This limit is called the derivative and is denoted by (among many 
other symbols) f'(a) or Sf (a). 


One of the key intuitive meanings of a derivative is that it should give the 
slope of the tangent line to the curve y = f(x) at the point a. While 
logically the current definition of a tangent line must include the above 
definition of derivative, in pictures the tangent line is of course: 





y tangent line 
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The idea behind the definition is that we can compute the slope of a 
line defined by any two points in the plane. In particular, for any z Æ a, 
the slope of the secant line through the points (a, f(a)) and (a, f(x)) will 
be 


Fa) - Fo) 


z&—a 





slope={(x)-f(a) ie 


A 06 f09) 





We now let x approach a. The corresponding secant lines will approach the 
tangent line. Thus the slopes of the secant lines must approach the slope 
of the tangent line. 


ge ngen; line 





Hence the definition for the slope of the tangent line should be: 


f'(a) = lim f(z) Pa F(a) 


Za =a 
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Part of the power of derivatives (and why they can be taught to high 
school seniors and first year college students) is that there is a whole calcu- 
lational machinery to differentiation, allowing us to usually avoid the actual 
taking of a limit. 

We now look at an example of a function that does not have a derivative 
at the origin, namely 

f(z) = |z]. 


This function has a sharp point at the origin and thus no apparent tangent 
line there. We will show that the definition yields that f(x) = |z| is indeed 
not differentiable at x = 0. Thus we want to show that 


im $- 40 


20 2z—0 
does not exist. Luckily 


f(x) - 40 seul: 2 >0 


z—0 x -1, «<0’ 


which we have already shown in the last section to not have a limit as x 
approaches 0. 
2.4 Integration 


Intuitively the integral of a positive function f(x) with domain a < x < b 
should be the area under the curve y = f(x) above the z-axis. 





= f(x) 





2.4. INTEGRATION 29 


When the function f(z) is not everywhere positive, then its integral should 
be the area under the positive part of the curve y = f(x) minus the area 
above the negative part of y = f(x). 






positive area 






yO 


negative area 


Of course this is hardly rigorous, as we do not yet even have a good defini- 
tion for area. 

The main idea is that the area of a rectangle with height a and width b 
is ab. 





To find the area under a curve y = f(a) we first find the area of various 
rectangles contained under the curve and then the area of various rectangles 
just outside the curve. 


We then make the rectangles thinner and thinner, as in: 
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We take the limits, which should result in the area under the curve. 

Now for the more technically correct definitions. We consider a real- 
valued function f(x) with domain the closed interval [a,b]. We first want 
to divide, or partition, the interval [a,b] into little segments that will be 
the widths of the approximating rectangles. For each positive integer n, let 





þ= 
At=-—* 
n 
and 
a = to, 
t = to+At, 
to = t, + At, 
ta(= b) = th-1+At. 


For example, on the interval [0,2] with n = 4, we have At = 252 = 3 and 


k 
to=0 t = 12 to=1 t3 = y2 t4= 2 


On each interval [t,-1,t,], choose points J, and ux such that for all points 
t on [tk-1, tx], we have 


Fle) < FC) 
and 


Fur) > FO. 


We make these choices in order to guarantee that the rectangle with 
base [tk—1,tk] and height f(lx) is just under the curve y = f(x) and that 
the rectangle with base [é,_1,tx] and height f(ux) is just outside the curve 
y = f(z). 
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Ney Oe 


tki lk tk tk-1 Uk tk 


Definition 2.4.1 Let f(x) be a real-valued function defined on the closed 
interval [a,b]. For each positive integer n, let the lower sum of f(x) be 


L(f,n) = Ñ f(x) At 


k=1 


and the upper sum be 
n 
U(f,n) =D) f(un)At. 
k=1 


Note that the lower sum L(f,n) is the sum of the areas of the rectangles 
below our curve while the upper sum U(f, n) is the sum of the areas of the 
rectangles sticking out above our curve. 

Now we can define the integral. 


Definition 2.4.2 A real-valued function f(x) with domain the closed in- 
terval [a,b] is said to be integrable if the following two limits exist and are 
equal: 

lim L(f,n) = lim U(f,n). 

noo NCO 


If these limits are equal, we denote the limit by J? f(x)dx and call it the 
integral of f(x). 


While from pictures it does seem that the above definition will capture 
the notion of an area under a curve, almost any explicit attempt to actually 
calculate an integral will be quite difficult. The goal of the next section, 
the Fundamental Theorem of Calculus, is to see how the integral (an area- 
finding device) is linked to the derivative (a slope-finding device). This will 
actually allow us to compute many integrals. 


2.5 The Fundamental Theorem of Calculus 


Given a real-valued function f(x) defined on the closed interval [a, b] we can 
use the above definition of integral to define a new function, via setting: 


ie / ” f(t) dt. 


32 CHAPTER 2. e AND ô REAL ANALYSIS 


We use the variable ¢ inside the integral sign since the variable x is 
already being used as the independent variable for the function F(x). Thus 
the value of F(x) is the number that is the (signed) area under the curve 
y = f(x) from the endpoint a to the value z. 





F(x) = fiat 


The amazing fact is that the derivative of this new function F(s) will simply 
be the original function f(x). This means that in order to find the integral 
of f(x), you should, instead of fussing with upper and lower sums, simply 
try to find a function whose derivative is f(z). 

All of this is contained in: 


Theorem 2.5.1 (Fundamental Theorem of Calculus) Let f(x) be 
a real-valued continuous function defined on the closed interval [a,b] and 
define 


ras f Hadt. 


Then: 
a) The function F(x) is differentiable and 


dF(z) _ df? f(t) dt = f(z) 
dr dz E 
and 
b) If G(x) is a real-valued differentiable function defined on the 
closed interval [a, b] whose derivative is: 


dG(x) _ 
dz D f(z) > 





then ` 
i f(c)dz = G(b) — Gla). 


First to sketch part a: We want to show that for all x in the interval [a, }], 
the following limit exists and equals f(x): 


im Leth) — Fe) 


) ae 
h-0 h 


= f(z). 
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Note that we have mildly reformulated the definition of the derivative, from 
limg-y25(f (x) — f(xo))/(x — Zo) to limpso(f(x@ + h) — f(x))/h. These are 
equivalent. Also, for simplicity, we will only show this for x in the open 
interval (a,b) and take the limit only for positive h. Consider 


F(a+h)—F(a) _ f?*" p(t)at— f? f(t) dt 
h h 
_ fet" f(tyat 
—. h 


F(x+h)-F(x) = f ddt 


a x x+h 


On the interval [2,2 +h], for each h define l, and up so that for all points 
t on [z, £ + h], we have 


F(ln) < Fé) 
and 


f(un) = f(t). 


(Note that we are, in a somewhat hidden fashion, using that a continuous 
function on an interval like [x, 2 + h] will have points such as 1, and u,. In 
the chapter on point set topology, we will make this explicit, by seeing that 
on a compact set, such as [x,x +h], a continuous function must achieve 
both its maximum and minimum.) 


BLN 


U, x+h 


Then we have 
a-+h 


ahs f HOAS fn) 


z 


34 CHAPTER 2. e AND ô REAL ANALYSIS 


Dividing by h > 0 gives us: 


< flur). 


z+h 
f(y) < 2 AO" 


Now both the J, and the up, approach the point x as h approaches zero. 
Since f(x) is continuous, we have that 


lim f(a) = jim f(un) = f(2) 


and our result. 
Turn to part b: Here we are given a function G(x) whose derivative is: 


dG (az) 
dx 





= f(z). 


Keep the notation of part a, namely that F(a) = alee f(t) dt. Note that 
F(a) = 0 and 


b 
[ t@at= FO =FO-FO. 


By part a, we know that the derivative of F(x) is the function f(z). Thus 
the derivatives of F(x) and G(x) agree, meaning that 


d(F(a) — G(z)) 


Ta = f(z) - f(z) =0. 


But a function whose derivative is always zero must be a constant. (We 
have not shown this. It is quite reasonable, as the only way the slope of the 


tangent can always be zero is if the graph of the function is a horizontal 
line; the proof does take some work.) Thus there is a constant c such that 


Then 
b 
[ t@at= FO =FO-FO 
= (G(b) +c) — (G(a) +c) 
= G(b) — G(a) 


as desired. 
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2.6 Pointwise Convergence of Functions 
Definition 2.6.1 Let f, : [a,b] > R. be a sequence of functions 

filx), fol), fa (2), ... 


defined on an interval [a,b] = {x :a < x < b}. This sequence {fn(x)} will 
converge pointwise to a function 


f(z): [a,b] > R 


if for all a in [a,b], 
Jim fn(@) = f(a). 

In € and ô notation, we would say that {f,(x)} converges pointwise to 
f(x) if for all a in [a,b] and given any € > 0, there is a positive integer NV 
such that for all n > N, we have |f(a) — fn(a)| <e. 

Intuitively, a sequence of functions fn(x) will converge pointwise to a 
function f(x) if, given any a, eventually (for huge n) the numbers fn(a) 
become arbitrarily close to the number f(a). The importance of a good 
notion for convergence of functions stems from the frequent practice of only 
approximately solving a problem and then using the approximation to un- 
derstand the true solution. Unfortunately, pointwise convergence is not as 
useful or as powerful as the next section’s topic, uniform convergence, in 
that the pointwise limit of reasonable functions (e.g., continuous or inte- 
grable functions) does not guarantee the reasonableness of the limit, as we 
will see in the next example. 

Here we show that the pointwise limit of continuous functions need not 
be continuous. For each positive integer n, set 


f fn(z) = 2” 
for all x on {0, 1). 
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Set j ; 
5, a 
fla) = {9 O0<2<1 


Clearly f(a) is not continuous at the endpoint 2 = 1 while all of the func- 
tions f,(z) = x” are continuous on the entire interval. But we will see that 
the sequence { f,(2)} does indeed converge pointwise to f(x). 

Fix a in [0,1]. If œa = 1, then f,(1) = 1” = 1 for all n. Then 


jim fn(1) = jim 1=1= f(1). 
Now let 0 < a < 1. We will use (without proving) the fact that for any 
number a less than 1, the limit of a” will approach 0 as n approaches oo. 
In particular, 


lim f,(a) = lim a” 
noo n> 
= 0 
= f(a). 


Thus the pointwise limit of a sequence of continuous functions need not be 
continuous. 


2.7 Uniform Convergence 
Definition 2.7.1 A sequence of functions fn : [a,b] + R will converge 


uniformly to a function f : [a,b] + R if given any € > 0, there is a positive 
integer N such that for alln > N, we have 


IF (2) — fala) <e 


for all points x. 


The intuition is that if we put an e-tube around the function y = f(z), 
the functions y = fn(x) will eventually fit inside this band. 
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The key here is that the same e and N will work for all æ. This is not 
the case in the definition of pointwise convergence, where the choice of N 
depends on the number z. 

Almost all of the desirable properties of the functions in the sequence 
will be inherited by the limit. The major exception is differentiability, but 
even here a partial result is true. As an example of how these arguments 
work, we will show 


Theorem 2.7.1 Let f, : [a,b] + R be a sequence of continuous functions 
converging uniformly to a function f(x). Then f(x) will be continuous. 


Proof: We need to show that for all œ in [a,b], 
lim f(e) = f(a). 


Thus, given any € > 0, we must find some 6 > 0 such that for 0 < |z~—al| <4, 
we have 


If(z) — fla)| <e. 

By uniform convergence, there is a positive integer N so that 
€ 
If@)- fn(e)l < $ 


for all x. (The reason for the $ will be seen in a moment.) 


By assumption each function f(x) is continuous at the point a. Thus 
there is a 6 > 0 such that for 0 < |z ~ a| < 6, we have 


€ 
lfn(2) — fn(o)| < 3- 
Now to show that for 0 < |x — a| < ô, we will have 


If(z) - F) < e. 
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We will use the trick of adding appropriate terms which sum to zero and 
then applying the triangle inequality ( |A + B| < |A| + |B|). We have 


\f(z)— f(@)| = | fle) — f(z) + f(z) — fr(a) + f(a) - f(a) | 
< |f(z) — fr(z)| + |fu(z) — fv (@)| + |fn (a) — f(a)| 
eee 


and we are done. O 
We can now make sense out of series (infinite sums) of functions. 


Definition 2.7.2 Let f(x), fo(x),... be a sequence of functions. The series 
of functions 


f(z) + fola) +.. => he) 


converges uniformly to a function f(x) r fiz sequence of partial sums: 
fila), f(z) + fo(x), f(z) + fox) + fa(a),... converges uniformly to f(z). 


In terms of ¢ and 6's, the infinite series of functions 7°, fx(x) converges 
uniformly to f(a) if given any € > 0 there is a positive integer N such that 
for alln > N, 


Fæ) -X fla) <6, 
k=1 


for all z. 
We have 


Theorem 2.7.2 If cach function fx(x) is continuous and if Prc; fela) 
converges uniformly to f(a), then f(x) must be continuous. 


This follows from the fact that the finite sum of continuous functions is 
continuous and the previous theorem. 

The writing of a function as a series of uniformly converging (simpler) 
functions is a powerful method of understanding and working with func- 
tions. It is the key idea behind the development of both Taylor series and 
Fourier series (which is the topic of Chapter Thirteen). 


2.8 The Weierstrass M-Test 


If we are interested in infinite series of functions )>,—, fx(x), then we must 
be interested in knowing when the series converges uniformly. Luckily the 
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Weierstrass M-test provides a straightforward means for determining uni- 
form convergence. As we will see, the key is that this theorem reduces the 
question of uniform convergence of $> gc] f(x) to a question of when an 
infinite series of numbers converges, for which beginning calculus provides 
many tools, such as the ratio test, the root test, the comparison test, the 
integral test, etc. 


Theorem 2.8.1 Let Yg; felz) be a series of functions, with each func- 


tion f(x) defined on a subset A of the real numbers. Suppose Yr, Mp is 
a series of numbers such that: 


1. 0< (fela) < Mr, for al x € A. 
2. The series X gı Mp converges. 


Then X g; fk(£) converges uniformly and absolutely. 


By absolute convergence, we mean that the series of absolute values 

rei |fk(£)| also converges uniformly. 

Proof: To show uniform convergence, we must show that, given any e€ > 0, 
there exists an integer N such that for all n > N, we have 


Oo 
| y fr (x)| <6, 
k=n 
for all z € A. Whether or not 377°, fe(x) converges, we certainly have 
o0 oO 
[do fe(@)l < So Ife) 
k=n k=n 
Since X` zı Mp converges, we know that we can find an N so that for all 
n > N, we have 
co 
D Mr <e. 
k=n 
Since 0 < |fx(x)| < Mg, for all x € A, we have 


do fel@)l < So fale) Y M <e, 
k=n k=n k=n 


and we are done. O 2 
Let us look an easy example. Consider the series Xz, Fr, which from 
calculus we know to be the Taylor series for e”. We will use the Weierstrass 
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M-test to show that this series converges uniformly on any interval [—a, a]. 
Here we have f(z) = A Set 


Note that for all z € |—a,a], we have 0 < |z|” /n! < a”/n!. Thus if we can 
show that the series) zc; Mk = Drai H converges, we will have uniform 
convergence. By the ratio test, Jz- 4r will converge if the limit of ratios 





(5) 
tn Me = pa ED 


exists and is strictly less than one. But we have 








akt? 
li (k+1)! — j a —0. 
ee a ee (k +1) 


Thus the Taylor series for e” will converge uniformly on any closed interval. 


2.9 Weierstrass’ Example 


Our goal is find a function that is continuous everywhere but differentiable 
nowhere. When Weierstrass first constructed such functions in the late 
1800s, mathematicians were shocked and surprised. The conventional wis- 
dom of the time was that no such function could exist. The moral of this 
example is that one has to be careful of geometric intuition. 

We will follow closely the presentation given by Spivak in his Calculus 
[102] in Chapter 23. We need a bit of notation. Set {x} = distance from 
z to the nearest integer. For example, {2} = } and {1.3289} = .3289, etc. 


The graph of {x} is: 
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Define be 
f(a) = Do ap {10a}. 


k=1 


Our goal is: 


Theorem 2.9.1 The function f(x) is continuous everywhere but differen- 
tiable nowhere. 


First for the intuition. For simplicity we restrict the domain to be the 
unit interval (0,1). For k = 1, we have the function # {102}, which has a 
graph: 





This function is continuous everywhere but not differentiable at the 19 
points .05,.1,.15,...,.95. Then {x} + ;5{10z} has the graph: 





and is continuous everywhere but not differentiable at .05,.1,.15,...,.95. 
For k = 2, the function i {1007r} is continuous everywhere but is not 
differentiable at its 199 sharp points. Then the partial sum {10x} + 
790 {100z} is continuous everywhere but not differentiable at the 199 sharp 
points. In a similar fashion, Ta {10007x} is also continuous, but now loses 
differentiability at its 1999 sharp points. As we continue, at every sharp 
edge, we lose differentiability, but at no place is there a break in the graph. 
As we add all the terms in > z4r {10*«}, we eventually lose differentiability 
at every point. The pictures are compelling, but of course we need a proof. 
Proof: (We continue to follow Spivak) 
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The easy part is in showing that f(x) = g, q4 {10*7} is continuous, as 
this will be a simple application of the Weierstrass M-test. We know that 
{a} < 4 for all x. Thus we have, for all k, that 


Eoi 1 
Jor t0 E} S Toe 
The series 
oO 1 oO 
LTI 2. y= 79 Li a 


is a geometric series and thus must converge (just use the ratio test). Then 
by the Weierstrass M-test, the series f(z) = goi gr {10x} converges 
uniformly. Since each function ji, {10*«} is continuous, we have that f(z) 
must be continuous. 

It is much harder to show that f(x) is not differentiable at every point; 
this will take some delicate work. Fix any x. We must show that 


im [Eth -F 
hoo h 


does not exist. We will find a sequence, Am, of numbers that approach zero 
such that the sequence £ thm =/(2) does not converge. 
Write z in its decimal e ai 


t= A.0102..., 
where a is zero or one and each a, is an integer between zero and nine. Set 
h = 107™ ifam #4 orifam #9 
mm") -10-” ifam =4 orifan =9 
Then 


Iioa a.01...(@m+Il)amy1... ifam #4 orifam #9 
eS a.0Q,..-(@m—l)amii-.. ifam =4 or ifam =9 


We will be looking at various 10"(@ + hm). The 10” factor just shifts 
where the decimal point lands. In particular, if n > m, then 


10" (a2 + hm) = aar ... (am + 1)am41.--Qn-An4i---, 


in which case 
{10"(a@ + hm)} = {10" 2}. 
Ifn < m, then 10"°(@ + hm) = aa1...An-Gn4i-.-(@m + l)amyi..-, in 
which case we have 


z _ | Oan41... (am + 1)am41..- ifam #4 orifam #9 
{10 (24+ hm} = { O.an+1 -.. (am —Damy1--- ifam =4 orifam =9 
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We are interested in the limit of 


f(et+ hm) — f(e) _ 5 xix {10% (x + hm)} — ror {10r} 


hm k=0 hm 


Since {10*(2 + hm)} = {10*2}, for k > m, the above infinite series is 
actually the finite sum: 


= 5) £10 #({108(e+hm)}—{10%2}). 


D ar {10t (£ + hm)} — g {10r} 
k=0 k=0 


hm 


We will show that each +10™—*({10*(2 + hm)} — {10*x}) is a plus or 
minus one. Then the above finite sum is a sum of plus and minus ones and 
thus cannot be converging to a number, showing that the function is not 
differentiable. 

There are two cases. Still following Spivak, we will only consider the 
case when 10*a = Aki- < ł (the case when .ak+1 --. > ł is left to the 
reader). Here is why we had to break our definition of the hm into two 
separate cases. By our choice of hm, {10*(x + hm)} and {10*x} differ only 
in the (m — k)th term of the decimal expansion. Thus 


1 


Then 10°-*({10*(2-++Am)}—{10*a}) will be, as predicted, a plus or minus 
one. O 


2.10 Books 


The development of e and ô analysis was one of the main triumphs of 1800s 
mathematics; this means that undergraduates for most of the last hundred 
years have had to learn these techniques. There are many texts. The 
one that I learned from and one of my favorite math books of all times 
is Michael Spivak’s Calculus [102]. Though called a calculus book, even 
Spivak admits, in the preface to the second and third editions, that a more 
apt title would be “An Introduction to Real Analysis”. The exposition is 
wonderful and the problems are excellent. 

Other texts for this level of real analysis include books by Bartle [6], 
Berberian [7], Bressoud [13], Lang [80], Protter and Morrey [94] and Rudin 
[96], among many others. 
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2.11 Exercises 


1. Let f(x) and g(x) be differentiable functions. Using the definition of 
derivatives, show 

a. (f+g)'=f'+g'. 

b. (fg) = f'g + fg’. 

c. Assume that f(x) = c, where cis a constant. Show that the derivative 
of f(x) is zero. 
2. Let f(x) and g(x) be integrable functions. 

a. Using the definition of integration, show that the sum f(z) + g(a) is 
an integrable function. 

b. Using the Fundamental Theorem of Calculus and problem 1.a, show 
that the sum f(x) + g(x) is an integrable function. 
3. The goal of this problem is to calculate h xdgr three ways. The first 
two methods are not supposed to be challenging. 

a. Look at the graph of the function y = «. Note what type of geometric 
object this is, and then get the area under the curve. 

b. Find a function f(x) such that f'(x) = x and then use the Funda- 
mental Theorem of Calculus to find i. zdz. 

c. This has two parts. First show by induction that 


Di- a 


Then use the definition of the integral to find Pe zdz. 

4. Let f(x) be differentiable. Show that f(x) must be continuous. (Note: 
intuitively this makes a lot of sense; after all, if the function f has breaks 
in its graph, it should not then have well-defined tangents. This problem 
is an exercise in the definitions.) 

5. On the interval [0,1], define 


yas 1 if zis rational 
~ ) 0 if zis not rational 


Show that f(x) is not integrable. (Note: you will need to use the fact that 
any interval of any positive length must contain a rational number and an 
irrational number. In other words, both the rational and the irrational 
numbers are dense.) 

6. This is a time-consuming problem but is very worthwhile. Find a calculus 
textbook. Go through its proof of the chain-rule, namely that 


(a(x) = f'(o(@)) -9'(@). 


2.11. EXERCISES 45 


7. Go again to the calculus book that you used in problem six. Find the 
chapter on infinite series. Go carefully through the proofs for the following 
tests for convergence: the integral test, the comparison test, the limit com- 
parison test, the ratio test and the root test. Put all of these tests into the 
language of € and ô real analysis. 


Chapter 3 


Calculus for 
Vector- Valued Functions 


Basic Object: R” 
Basic Map: Differentiable functions f : R” => R” 





Basic Goal: Inverse Function Theorem 


3.1 Vector-Valued Functions 


A function f : R” — R”™ is called vector-valued since for any vector x in 
R”, the value (or image) of f(z) is a vector in R”. If (21,...,2n) is a 
coordinate system for R”, the function f can be described in terms of m 
real-valued functions by simply writing: 


fi(@1,°*+,2n) 
f(@1,+--)8n) = 
Fim(#1 cag 


Such functions occur everywhere. For example, let f : R ~ R? be defined 


as 
_ { cos(t) 
its Gas ) i 
Here t is the coordinate for R. Of course this is just the unit circle parametrized 
by its angle with the z-axis. 
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(cos(t),sin(l)) _ 


This can also be written as « = cos(t) and y = sin(¢). 
For another example, consider the function f : R? + R® given by 


COS 2} 
f(a1, v2) = | sina, }. 
T2 





This function f maps the (2,22) plane to a cylinder in space. 
Most examples are quite a bit more complicated, too complicated for 
pictures to even be drawn, much less used. 
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3.2. Limits and Continuity of Vector- Valued 
Functions 


The key idea in defining limits for vector-valued functions is that the 
Pythagorean Theorem gives a natural way for measuring distance in R”. 


Definition 3.2.1 Let a = (a1,...,@n) and b = (b1,...,bn) be two points 
in R”. Then the distance between a and b, denoted by |a — b|, is 
ja — b| = (ai — b1)? + (a2 — b2)? + +- + (an — bn)? 


The length of a is defined by 


lal = /a? +--- +2. 


Note that we are using the word “length” since we can think of the point 
ain R?” as a vector from the origin to the point. 

Once we have a notion of distance, we can apply the standard tools 
from € and ô style real analysis. For example, the reasonable definition of 
limit must be: 


Definition 3.2.2 The function f : R” => R” has limit 
L= (L1,..., Lm) E€ R” 


at the point a = (a1,...,an) E R” if given any e > 0, there is some 6 > 0 
such that for all x € R”, if 


0<|2-a| <6, 
we have 
\f(xz) —L| <e. 
We denote this limit by 
lim f(z) = L 
za 


or by f(z) > Lasx-a. 
Of course, continuity must now be defined by: 


Definition 3.2.3 The function f : R” — R” is continuous at a point a 
in R” if limsa f(x) = f(a). 


Both the definitions of limit and continuity rely on the existence of 
a distance. Given different norms (distances) we will have corresponding 
definitions for limits and for continuity. 
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3.3 Differentiation and Jacobians 


For single variable functions, the derivative is the slope of the tangent line 
(which is, recall, the best linear approximation to the graph of the original 
function) and can be used to find the equation for this tangent line. In a 
similar fashion, we want the derivative of a vector-valued function to be a 
tool that can be used to find the best linear approximation to the function. 

We will first give the definition for the vector-valued derivative and then 
discuss the intuitions behind it. In particular we want this definition for 
vector-valued functions to agree with the earlier definition of a derivative 
for the case of single variable real-valued functions. 


Definition 3.3.1 A function f : R” — R” is differentiable at a € R” if 
there is an m x n matriz A : R” —> R” such that 


im KOHO -A E-A, 


za |x ~ al 


If such a limit exists, the matriz A is denoted by Df(a) and is called the 
Jacobian 


Note that f(x), f(a) and A- (x — a) are all in R™ and hence 
[f(z) — f(a) ~ A- (z ~ a)l 


is the length of a vector in R™. Likewise, x — a is a vector in R”, forcing 
|x — al to be the length of a vector in R”. Further, usually there is an easy 
way to compute the matrix A, which we will see in a moment. Also, if the 
Jacobian matrix D f (a) exists, one can show that it is unique, up to change 
of bases for R” and R”. 

We definitely want this definition to agree with the usual definition of 
derivative for a function f : R > R. With f : R — R, recall that the 
derivative f'(a) was defined to be the limit 


ma fE) -— Fla) 
R GE T 
Unfortunately, for a vector-valued function f : R” + R™ with n and m 
larger than one, this one-variable definition is nonsensical, since we cannot 
divide vectors. We can, however, algebraically manipulate the above one- 
variable limit until we have a statement that can be naturally generalized 
to functions f : R” = R” and which will agree with our definition. 
Return to the one-variable case f : R — R. Then 


page Oa 


2a x—a 


f'(a 
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is true if and only if 


0 = lim J) = f@ 


ga Tama 


T F (a), 
which is equivalent to 


f(z) - f(a) - f'(a)(z — a) 


0 = lim 
r>a t—a 
or 
a- m FO- t- Oe- 
za jz _ a| 


This last statement, at least formally, makes sense for functions f : R” > 
R”, provided we replace f'(a) (a number and hence a 1 x 1 matrix) by an 
m x n matrix, namely the Jacobian Df (a). 

As with the one-variable derivative, there is a (usually) straightforward 
method for computing the derivative without resorting to the actual taking 
of a limit, allowing us to actually calculate the Jacobian. 


Theorem 3.3.1 Let the function f :R” + R™ be given by the m differ- 


entiable functions fi(a1,...,2n),---;fm(®1,---,2n), $0 that 
fi(ai,- itn) 
f (21, -,En) = 
Fin (@iy sy En) 
Then f is differentiable and the Jacobian is 
of, of 
Oxy t Oln 
Df(z)=) : : 
Dfm Ofm 
Qa, t Dan 


The proof, found in most books on vector calculus, is a relatively straight- 
forward calculation stemming from the definition of partial derivatives. But 
to understand it, we look at the following example. Consider our earlier 
example of the function f : R? > R® given by 


COS 24 
f(x1,%2) = | sina, |, 


T2 
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which maps the (21,22) plane to a cylinder in space. Then the Jacobian, 
the derivative of this vector-valued function, will be 


Ocos(x1)/O2%, Acos(x)/Azxe 


Df(ai,22) = O(sinz,)/Ox, Osin(x1)/Oxre 
Ox /Ox, Ox2 /Oxe2 
—sing, 0 
= cost; 0 
0 1 


One of the most difficult concepts and techniques in beginning calculus 
is the chain rule, which tells us how to differentiate the composition of 
two functions. For vector-valued forms, the chain rule can be easily stated 
(though we will not give the proof here). It should relate the derivative of 
the composition of functions with the derivatives of each component part 
and in fact has a quite clean flavor, namely: 


Theorem 3.3.2 Let f : R” + R™ andg: R™ > R! be differentiable 
functions. Then the composition function 


gof:R” >R 
is also differentiable with derivative given by: if f(a) = b, then 
D(g° f)(a) = D(g)(b) - D(f)(a). 


Thus the chain rule says that to find the derivative of the composition go f, 
one multiplies the Jacobian matrix for g times the Jacobian matrix for f. 

One of the key intuitions behind the one-variable derivative is that f’(a) 
is the slope of the tangent line to the curve y = f(x) at the point (a, f(a)) 
in the plane R?. In fact, the tangent line through (a, f(a)) will have the 
equation 


y = f(a) + f'(a)(z - a). 


y = i{a) + f(a)(x-a) 
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This line y = f(a) + f'(a)(x — a) is the closest linear approximation to 
the function y = f(x) at z = a. 

Thus a reasonable criterion for the derivative of f : R” > R™ should 
be that we can use this derivative to find a linear approximation to the 
geometric object y = f(x), which lies in the space R"*™. But this is 
precisely what the definition 


fen £02) Fla) — Df(a(e~a)| _ , 
xa |z—a] 


does. Namely, f(x) is approximately equal to the linear function 


f(a) + Df(a)- (e@—a). 


Here D f(a), as an m x n matrix, is a linear map from R” > R™ and f(a), 
as an element of R™, is a translation. Thus the vector y = f(x) can be 
approximated by 


y % f(a) + Df (a) - (x£ — a). 


3.4 The Inverse Function Theorem 


Matrices are easy to understand, while vector-valued functions can be quite 
confusing. As seen in the last section, one of the points of having a deriva- 
tive for vector-valued functions is that we can approximate the original 
function by a matrix, namely the Jacobian. The general question is now 
how good of an approximation do we have. What decent properties for ma- 
trices can be used to get corresponding decent properties for vector-valued 
functions? 

This type of question could lead us to the heart of numerical analysis. 
We will limit ourselves to seeing that if the derivative matrix (the Jaco- 
bian) is invertible, then the original vector-valued function must also have 
an inverse, at least locally. This theorem, and its close relative the Im- 
plicit Function Theorem, are key technical tools that appear throughout 
mathematics. 


Theorem 3.4.1 (Inverse Function Theorem) For a vector-valued con- 
tinuously differentiable function f : R” + R™, assume that det Df (a) 4 0, 
at some point a in R”. Then there is an open neighborhood U ofa in R” 
and an open neighborhood V of f(a) in R® such that f :U —> V is one to 
one, onto and has a differentiable inverse g: V + U (ie, gof:U >U 
is the identity and fog:V -> V is the identity). 
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Why should a function f have an inverse? Let us think of f as being 
approximated by the linear function 


f(z) = f(a) + Df (a): (x ~a). 


From the key theorem of linear algebra, the matrix D f(a) is invertible if and 
only if det Df(a) ~ 0. Thus f(x) should be invertible if f(a)+D f(a)-(a—a) 
is invertible, which should happen precisely when det Df(a) 4 0. In fact, 
consider 


y = f(a) + Df (a) - (z - a). 


Here the vector y is written explicitly as a function of the variable vector 
x. But if the inverse to Df(a) exists, then we can write x explicitly as a 
function of y, namely as: 


z =a+ Df(a)™ - (y - f(a)). 


In particular, we should have, if the inverse function is denoted by f7}, 
that its derivative is simply the inverse of the derivative of the original 
function f, namely 


Df~*(b) = Df(a)", 
where b = f(a). This follows from the chain rule and since the composition 
is folof=TI. 
For the case of f : R — R, the idea behind the Inverse Function 
Theorem can be captured in pictures: 


y locally no inverse 
a function 





If the slope of the tangent line, f'(a), is not zero, the tangent line will not 
be horizontal, and hence there will be an inverse. 
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In the statement of the theorem, we used the technical term “open set”. 
There will be much more about this in the next chapter on topology. For 
now, think of an open set as a technical means allowing us to talk about all 
points near the points a and f(a). More precisely, by an open neighborhood 
U of a point a in R”, we mean that, given any a € U, there is a (small) 
positive e such that 

{x:|jzr-al <e} cU. 


In pictures, for example, 
{(z,y) E€ R? : |(æ,y) — (0,0)| = Vz? +y? < 1} 


is not open (it is in fact closed, meaning that its complement is open in the 
plane R°), 







S 


7 





while the set 
{(z,y) € R? : |(x,y) — (0,0)| < 1} 


is open. 





10-29-96 
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3.5 Implicit Function Theorem 


Rarely can a curve in the plane be described as the graph of a one-variable 
function 


y = f(z), 


y= f(x) 


though much of our early mathematical experiences are with such functions. 
For example, it is impossible to write the circle 


z? +y =1 


as the graph of a one-variable function, since for any value of z (besides —1 
and 1) there are either no corresponding values of y on the circle or two 
corresponding values of y on the circle. This is unfortunate. Curves in the 
plane that can be cleanly written as y = f(x) are simply easier to work 
with. 

However, we can split the circle into its top and bottom halves. 
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For each half, the variable y can be written as a function of x: for the top 


half, we have 
y=vl- t4; 
and for the bottom half, 
y=-vVy1l-zr?. 


Only at the two points (1,0) and (—1,0) are there problems. The difficulty 
can be traced to the fact that at these two points (and only at these two 
points) the tangent lines of the circle are perpendicular to the x-axis. 

This is the key. The tangent line of a circle is the best linear approxi- 
mation to the circle. If the tangent line can be written as 


y=me +b, 


then it should be no surprise that the circle can be written as y = f(x), at 
least locally. 

The goal of the Implicit Function Theorem is to find a computational 
tool that will allow us to determine when the zero locus of a bunch of 
functions in some R^ can locally be written as the graph of a function and 
thus in the form y = f(x), where the x denote the independent variables 
and the y will denote the dependent variables. Buried (not too deeply) is 
the intuition that we want to know about the tangent space of the zero 
locus of functions. 

The notation is a bit cumbersome. Label a coordinate system for R”** 
by 


Lisen, Ylse- Yk 


which we will frequently abbreviate as (x,y). Let 


filar,- -En Yke Yk) ee- Sklr- -En ok) 


be k continuously differentiable functions, which will frequently be writ- 
ten as 


filz, y), ---, fe(£, y). 


Set 
V = {(x,y) E R°™ : filz, y) =0,..., fel2, y) = 0}. 


We want to determine when, given a point (a,b) € V (where a € R” and 
b € RË), there are k functions 


PEP 42) i APR een) 
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defined in a neighborhood of the point a on R” such that V can be de- 
scribed, in a neighborhood of (a,b) on R”**, as 


{(x,y) € Rt ¿Y1 = P1 (21, ---3,En)y -3 Yk = Pkl,- -Enh 


which of course is frequently written in the shorthand of 


V = {y = pi(z),..., ye = pe(a)}, 


or even more succinctly as 


V = {y = p(2)}. 


Thus we want to find k functions p1,..., pp such that for all z € R”, we 
have 


fi (x, p1(x)) = 0,..., f(a, px(x)) = 0. 


Thus we want to know when the k functions f,,...,f; can be used to 
define (implicitly, since it does take work to actually construct them) the 
k functions p,,..., Px- 


Theorem 3.5.1 (Implicit Function Theorem) Let f(x, y),..., fx(x,y) 
be k continuously differentiable functions on R°+* and suppose that p = 
(a,b) ER"** is a point for which 


fi(a,b) =0,..., fx(a,6) = 0. 


Suppose that at the point p the k x k matrix 


Əfı oft 

Oyi(p) °°" ~~ ykp) 
M=| : : 
Əfk ô fr 

Oyi(p) °"* Oye (p) 


is invertible. Then in a neighborhood of a in R” there are k unique, 
differentiable functions 


pi(2),--., pk(z) 
such that 
filz, pı (x)) = 0, .. -) fx(&, px(z)) = 0. 


Return to the circle. Here the function is f(z,y) = 27 +y?-—1=0. 
The matrix M in the theorem will be the 1 x 1 matrix: 


Of 
— = 2y. 
ðyı ? 


3.5. IMPLICIT FUNCTION THEOREM 59 


This matrix is not invertible (the number is zero) only where y = 0, namely 
at the two points (1,0) and (—1,0): only at these two points will there not 
be an implicitly defined function p. 

Now to sketch the main ideas of the proof, whose outline we got from 
[103] In fact, this theorem is a fairly easy consequence of the Inverse Func- 
tion Theorem. For ease of notation, write the k-tuple (f; (2, y),..., fe(x, y)) 
as f(x,y). Define a new function F : RF _, Rt by 


F(x,y) = (2, f(z,y)). 


The Jacobian of this map is the (n + k) x (n + k) matrix 


I 0 
(+ ar) 

Here the I is the n x n identity matrix, M is the k x k matrix of partials as 
in the theorem, 0 is the n x k zero matrix and x is some k x n matrix. Then 
the determinant of the Jacobian will be the determinant of the matrix M; 
hence the Jacobian is invertible if and only if the matrix M is invertible. 
By the Inverse Function Theorem, there will be a map G: R™** 4 R"™** 
which will locally, in a neighborhood of the point (a,b), be the inverse of 
the map F(z, y) = (a, f(z,y)). 

Let this inverse map G: R°+* — R”** be described by the real-valued 
functions Gi,...,Gn4% and thus as 


G(x, y) = (Gils, y) ---, Gn (2,Y)). 
By the nature of the map F, we see that for 1 <i<n, 
Gi (x,y) = zi. 
Relabel the last k functions that make up the map G by setting 


pi(z, y) = Gişn (£, y). 


Thus 

G(z, y) (x1, tery Ln, pr(Z,y), ease , Pk(Z,¥))- 
We want to show that the functions p;(z,0) are the functions the theorem 
requires. 

We have yet looked at the set of points in R"*+* where the original k 
functions f; are zero, namely the set that we earlier called V. The image 
of V under the map F will be contained in the set (z,0). Then the image 
G(x, 0), at least locally around (a,b), will be V. Thus we must have 


fi (G(z, 9) = 0,.. > Fx(G(a, 0)) = 0. 
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But this just means that 
fiz, pila, 0)) = 0, oa -) Fa (2, pe(z,9)) = 0, 


which is exactly what we wanted to show. 

Here we used the Inverse Function Theorem to prove the Implicit Func- 
tion Theorem. It is certainly possible and no harder to prove the Implicit 
Function Theorem first and then use it to prove the Inverse Function The- 
orem. 


3.6 Books 


An excellent recent book on vector calculus (and for linear algebra and 
Stokes’ Theorem) is by Hubbard and Hubbard [64]. Fleming [37] has been 
the standard reference for many years. Another, more abstract approach, is 
in Spivak’s Calculus on Manifolds [103]. Information on vector calculus for 
three variable functions is in most calculus books. A good general exercise 
is to look in a calculus text and translate the given results into the language 
of this section. 


3.7 Exercises 


1. In the plane R? there are two natural coordinate systems: polar coordi- 
nates (r,@) with r the radius and @ the angle with the x-axis and Cartesian 
coordinates (x, y). 





The functions that give the change of variables from polar to Cartesian 
coordinates are: 


x = f(r,8) = rcos(0) 
y = g(r,0) = rsin (0). 


a. Compute the Jacobian of this change of coordinates. 
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b. At what points is the change of coordinates not well-defined (i.e., at 
what points is the change of coordinates not invertible)? 

c. Give a geometric justification for your answer in part b. 
2. There are two different ways of describing degree two monic polynomials 
in one variable: either by specifying the two roots or by specifying the 
coefficients. For example, we can describe the same polynomial by either 
stating that the roots are 1 and 2 or by writing it as x? — 34+ 2. The 
relation between the roots r; and re and the coefficients a and b can be 
determined by noting that 


(£ —rı)(£ — r2) =z? +az +b. 


Thus the space of all monic, degree two polynomials in one variable can be 
described by coordinates in the root space (r),7r2) or by coordinates in the 
coefficient space (a, b). 

a. Write down the functions giving the change of coordinates from the 
root space to the coefficient space. 

b. Compute the Jacobian of the coordinate change. 

c. Find where this coordinate change is not invertible. 

d. Give a geometric interpretation to your answer in part c. 

3. Using the notation in the second question: 

a. Via the quadratic equation, write down the functions giving the 
change of coordinates from the coordinate space to the root space. 

b-d. Answer the same questions as in problem 2, but now for this new 
coordinate change. 

4. Set f(z, y) = z? — y?. 

a. Graph the curve f(z, y) = 0. 

b. Find the Jacobian of the function f(x,y) at the point (1,1). Give a 
geometric interpretation of the Jacobian at this point. 

c. Find the Jacobian of the function f(z, y) at the point (0,0). Give a 
geometric interpretation for why the Jacobian is here the two-by-two zero 
matrix. , 

5. Set f(x,y) = 2? — y?. 

a. Graph the curve f(z,y) = 0. 

b. Find the Jacobian of the function f(x,y) at the point (1,1). Give a 
geometric interpretation of the Jacobian at this point. 

c. Find the Jacobian of the function f(z,y) at the point (0,0). Give a 
geometric interpretation for why the Jacobian is here the two-by-two zero 
matrix. 


Chapter 4 


Point Set Topology 


Basic Object: Topological spaces 
Basic Map: Continuous functions 


Historically, much of point set topology was developed to understand the 
correct definitions for such notions as continuity and dimension. By now, 
though, these definitions permeate mathematics, frequently in areas seem- 
ingly far removed from the traditional topological space R”. Unfortunately, 
it is not at first apparent that these more abstract definitions are at all use- 
ful; there needs to be an initial investment in learning the basic terms. 
In the first section, these basic definitions are given. In the next section, 
these definitions are applied to the topological space R”, where all is much 
more down to earth. Then we look at metric spaces. The last section ap- 
plies these definitions to the Zariski topology of a commutative ring, which, 
while natural in algebraic geometry and algebraic number theory, is not at 
all similar to the topology of R”. 


4.1 Basic Definitions 


Much of point set topology consists in developing a convenient language 
to talk about when various points in a space are near to one another and 
about the notion of continuity. The key is that the same definitions can be 
applied to many disparate branches of math. 


Definition 4.1.1 Let X be a set of points. A collection of subsets U = 
{Ua} forms a topology on X if 


1. Any arbitrary union of the Ux is another set in the collection U. 
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2. The intersection of any finite number of sets Uy in the collection U 
is another set in U. 


3. Both the empty set d and the whole space X must be in U. 


The (X,U) is called a topological space. 


The sets Uax in the collection U are called open sets. A set C is closed if its 
complement X — C is open. 


Definition 4.1.2 Let A be a subset of a topological space X. Then the 
induced topology on A is described by letting the open sets on A be all sets 
of the form U N A, where U is an open set in X. 


A collection © = {Ua} of open sets is said to be an open cover of a 
subset A if A is contained in the union of the Uy. 


Definition 4.1.3 The subset A of a topological space X is compact if given 
any open cover of A, there is a finite subcover. 


In other words, if E = {Ua} is an open cover of A in X, then A being 
compact means that there are a finite number of the U., denoted let’s say 
by Ui, ...,Un, such that 


Ac (U,UU2U...UU,). 


It should not be at all apparent why this definition would be useful, much 
less important. Part of its significance will be seen in the next section when 
we discuss the Heine-Borel Theorem. 


Definition 4.1.4 A topological space X is Hausdorff if given any two points 
21,22 E€ X, there are two open sets U1, and Us with x, € Uy and x2 € Us, 
but with the intersection of U, and Uz empty. 


Thus X is Hausdorff if points can be isolated (separated) from each other 
by disjoint open sets. 


Definition 4.1.5 A function f : X — Y is continuous, where X and Y 
are two topological spaces, if given any open set U in Y, then the inverse 
image f—1(U) in X must be open. 


Definition 4.1.6 A topological space X is connected if it is not possible 
to find two open sets U and V in X with X =UUV andUNV =4. 
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Definition 4.1.7 A topological space in X is path connected if given any 
two points a and b in X, there is a continuous map 


f : [0,1] > X 


with 
f(O) =a and f(1) =b. 


Here of course 

[0,1] = {rER:0<z<1} 
is the unit interval. To make this last definition well-defined, we would 
need to put a topology on this interval [0, 1], but this is not hard and will 
in fact be done in the next section. 

Though in the next section the standard topology on R” will be devel- 
oped, we will use this topology in order to construct a topological space that 
is connected but is not path connected. It must be emphasized that this is 
a pathology. In most cases, connected is equivalent to path connected. 

Let 


X = {(0,t):-1 < t < 1}U {y = sin(=) : 2 > 0}. 





Put the induced topology on X from the standard topology on R?. 
Note that there is no path connecting the point (0,0) to (2,0). In fact, 
no point on the segment {(0,t) : —1 < t < 1} can be connected by a 
path to any point on the curve {y = sin(+) : « > 0}. But on the other 
hand, the curve {y = sin(4) : x > 0} gets arbitrarily close to the segment 
{(0,t) : -1 < # < 1} and hence there is no way to separate the two parts 
by open sets. 

Point set topology books would now give many further examples of 
various topological spaces which satisfy some but not all of the above con- 
ditions. Most have the feel, legitimately, of pathologies, creating in some 
the sense that all of these definitions are somewhat pedantic and not really 
essential. To counter this feel, in the last section of this chapter we will 
look at a nonstandard topology on commutative rings, the Zariski topology, 
which is definitely not a pathology. But first, in the next section, we must 
look at the standard topology on R”. 
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4.2 The Standard Topology on R” 


Point set topology is definitely a product of the early twentieth century. 
However, long before that, people were using continuous functions and re- 
lated ideas. Even in previous chapters, definitions were given for continuous 
functions, without the need to discuss open sets and topology. In this sec- 
tion we define the standard topology on R” and show that the definition 
of continuity given in the last chapter in terms of limits agrees with the 
definition given in the last section in terms of inverse images of open sets. 
The important point is that the open set version can be used in contexts 
for which the limit notion makes no sense. Also, in practice the open set 
version is frequently no harder to use than the limit version. 

Critical to the definition of the standard topology on R” is that there is 
a natural notion of distance on R”. Recall that the distance between two 
points a = (a),...,@,) and b = (b),...,b,) in R” is defined by 


ja — b| = y (aı — by)? +... + (Qn — bn)? 


With this, we can define a topology on R” by specifying as the open sets 
the following: 


Definition 4.2.1 A set U in R” will be open if given any a € R”, there 
is a real number € > 0 such that 


{az : |x —a| < €} 
is contained in U. 


In RŽ, sets of the form (a,b) = {2 :a < x < b} are open, while sets of the 
form [a,b] = {7 : a < x < b} are closed. Sets like [a,b) = {wm :a < x <b} 
are neither open nor closed. In R?, the set {(£, y) : £? +y? < 1} is open. 





while {(x, y) : £? +y? < 1} is closed. 
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_ Proposition 4.2.1 The above definition of an open set will define a topol- 
ogy on R”. 


(The proof is exercise 2 at the end of the chapter.) This is called the 
standard topology on R”. 


Proposition 4.2.2 The standard topology on R” is Hausdorff. 


This theorem is quite obvious geometrically: 





but we give a proof in order to test the definitions. 
Proof: Let a and b be two distinct points in R”. Let d = |a — b| be the 
distance from a to b. Set 


Ua = {2 E€ R" : e—a] < $} 


and j 
= {ee R®: |x —b] < 3}. 


Both U, and U, are open sets with a € U, and b € Up. Then R” will be 
Hausdorff if 


Ua NU = ¢. 
Suppose that the intersection is not empty. Let x € Ua N Up. Then, by 


using the standard trick of adding terms that sum to zero and using the 
triangle inequality, we have 


ja~b| = |a—sr+zr-—b]| 
< ja—al+|e—4| 
d d 
< 373 
_ 2d 
i ae 


< d. 
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Since we cannot have d = |a—b| < d and since the only assumption we made 
is that there is a point x in both Ug and Us, we see that the intersection 
must indeed be empty. Hence the space R” is Hausdorff. 0 . 

In Chapter Three, we defined a function f : R” > R™ to be continuous 
if, for alla € R”, 


lim f(x) = f(a), 


La 


meaning that given any € > 0, there is some ô > 0 such that if |z — a| < ô, 


then 
|f (x) — fla)| <e. 


This limit definition of continuity captures much of the intuitive idea that 
a function is continuous if it can be graphed without lifting the pen from 
the page. Certainly we want this previous definition of continuity to agree 
with our new definition that requires the inverse image of an open set to be 
open. Again, the justification for the inverse image version of continuity is 
that it can be extended to contexts where the limit version (much less the 
requirement of not lifting the pen from the page) makes no sense. 


Proposition 4.2.3 Let f:R” > R”™ be a function. For alla € R”, 


lim f(x) = f(a) 


if and only, if for any open set U in R”, the inverse image f—1(U) is open 
in R”. 


Proof: First assume that the inverse image of every open set in R™ is 
open in R”. Let a € R”. We must show that 


lim f(x) = f(a). 
Let « > 0. We must find some 6 > 0 so that if |a — a] < 6, then 


[f(x) — fla < €. 


Define 
U = {y E R” : |y — f(a)| < €}. 


The set U is open in R™. By assumption the inverse image 
f-(U) {2 € R” : f(s) €U} 
= {re R": |f(z) — f(a)| <¢} 


is open in R”. Since a € f—!(U), there is some real number 6 > 0 such 
that the set 


{x : |xz—a| < 6} 
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is contained in f—!(U), by the definition of open set in R”. But then if 
|x — a] < 6, we have f(x) € U, or in other words, 


|f(z) - f(a)| < €, 


which is what we wanted to show. Hence the inverse image version of 
continuity implies the limit version. 
Now assume that 


lim f(2) = f(a). 


Let U be any open set in R™. We need to show that the inverse f—1(U) is 
open in R”. 

If f-1(U) is empty, we are done, since the empty set is always open. 
Now assume f~!(U) is not empty. Let a € f~1(U). Then f(a) € U. Since 
U is open, there is a real number e > 0 such that the set 


{y E€ R” : jy — f(a)| < €} 


is contained in the set U. Since limga f(x) = f(a), by the definition of 
limit, given this € > 0, there must be some ô > 0 such that if |z — a| < ô, 
then 

|f(z) - fla)| <e. 
Therefore if |z — a| < 6, then f(x) € U. Thus the set 


{a : |x —a| < ô} 


is contained in the set f—!(U), which means that f—!(U) is indeed an open 
set. Thus the two definitions of continuity agree. O 

In the last section, a compact set was defined to be a set A on which 
every open cover È = {Ua} of A has a finite subcover. For the standard 
topology on R”, compactness is equivalent to the more intuitive idea that 
the set is compact if it is both closed and bounded. This equivalence is the 
goal of the Heine-Borel Theorem: 


Theorem 4.2.1 (Heine-Borel) A subset A of R” is compact if and only 
if it is closed and bounded. 


We will first give a definition for boundedness, look at some examples and 
then sketch a proof of a special case of the theorem. 


Definition 4.2.2 A subset A is bounded in R” if there is some fized real 
number r such that for all x € A, 


lal<r 


(i.e., A is contained in a ball of radius r). 
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For our first example, consider the open interval (0,1) in R, which is cer- 
tainly bounded, but is not closed. We want to show that this interval is 


also not compact. Let 
(a-d) 
n n 


1 1 
ees Pete 
{x z << z 


Un 


be a collection of open sets. 


Us 
fc. pe ee 
—— 
o 1 1 2 3 ; 
4 3 3 4 
| eae ee RE ee | 
Uy 


This collection will be an open cover of the interval, since every point 
in (0, 1) is in some Uy. (In fact, once a given point is in a set Un, it will be 
in every future set Un+,.) But note that no finite subcollection will cover 
the entire interval (0,1). Thus (0,1) cannot be compact. 

The next example will be of a closed but not bounded interval. Again an 
explicit open cover will be given for which there is no finite subcover. The 
interval [0,00) = {a : 0 < x} is closed but is most definitely not bounded. 
It also is not compact as can be seen with the following open cover: 


U, = (—1,n) = {t:-l<a2 <n}. 


The collection {U,}°2,, will cover [0,0o), but can contain no finite sub- 
cover. 


The proof of the Heine-Borel theorem revolves around reducing the 
whole argument to the special case of showing that a closed bounded inter- 
val'on the real line is compact. (On how to reduce to this lemma, see the 
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rigorous proof in Spivak [103], which is where we got the following argu- 
ment.) This is the technical heart of the proof. The key idea actually pops 
up in a number of different contexts, which is why we give it here. 


Lemma 4.2.1 On the real line R, a closed interval [a,b] is compact. 


Proof: Let © be an open cover of [a,b]. We need to find a finite subcover. 
Define a new set 


Y = {z € [a,b] : there is a finite subcover in D of the interval [a, x]}. 


Our goal is to show that our interval’s endpoint 6 is in this new set Y. 

We will first show that Y is not empty, by showing that the initial point 
aisin Y. If = a, then we are interested in the trivial interval [a,a] = a, 
a single point. Since © is an open cover, there is an open set V € E with 
[a,a) € V. Thus for the admittedly silly interval {a, a] there is a finite 
subcover, and thus a is in the set Y, meaning that, at the least, Y is not 
empty. 

Set œ to be the least upper bound of Y. This means that there are 
elements in Y arbitrarily close to a but that no element of Y is greater than 
a. (Though to show the existence of such a least upper bound involves the 
subtle and important property of completeness of the real number line, it is 
certainly quite reasonable intuitively that such an upper bound must exist 
for any bounded set of reals.) We first show that the point a is itself in the 
set Y and, second, that a is in fact the endpoint 6, which will allow us to 
conclude that the interval is indeed compact. 

Since a € [a,b] and since © is an open cover, there is an open set U in 
X with a € U. Since U is open in [a,b], there is a positive number e with 


{x:|z-—al <e} CU. 


Since a is the least upper bound of Y, there must be an z € Y that is 
arbitrarily close to but less than a. Thus we can find anz € Y NU with 


a-zZ<€, 
Since x € Y, there is a finite subcover Ui,...,Un of the interval [a, z]. 
Then the finite collection U,,...,Un,U will cover [a,a]. But this means, 


since each open set Ug and U are in X, that the interval [a, a] has a finite 
subcover and hence that the least upper bound a is in Y. 

Now assume a < b. We want to come up with a contradiction. We 
know that a is in the set Y. Hence there is a finite subcover U4, ...,Un of 
the collection © which will cover the interval [a,a]. Choose the open sets 
so that the point œ is in the open set Un. Since U, is open, there is an 
c > 0 with 

{z:|z -a| <e} C Up. 
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Since the endpoint b is strictly greater than the point a, we can actually 
find a point x that both is in the open set Un and satisfies 


&a<gr<hb. 


But then the finite subcover U1,...,Un will cover not only the interval 
[a,a] but also the larger interval [a,x], forcing the point æ to be in the set 
Y. This is impossible, since a is the largest possible element in Y. Since 
the only assumption that we made was that œ < b, we must have a = b, as 
desired. 0 

There is yet another useful formulation for compactness in R”. 


Theorem 4.2.2 A subset A in R” is compact if every infinite sequence 
(tn) of points in A has a subsequence converging to a point in A. Thus, 
if (£n) is a collection of points in A, there must be a point p € A and a 
subsequence In, with limg.o Eny = P. 


The proof is one of the exercises at the end of the chapter. 
Compactness is also critical for the following: 


Theorem 4.2.3 Let X be a compact topological space and let f: X > R 
be a continuous function. Then there is a point p E€ X where f has a 
marimum. 


We give a general idea of the proof, with the details saved for the exer- 
cises. First, we need to show that the continuous image of a compact set is 
compact. Then f(X) will be compact in R and hence must be closed and 
bounded. Thus there will be a least upper bound in f(X), whose inverse 
image will contain the desired point p. A similar argument can be used to 
show that any continuous function f(z) on a compact set must also have a 
minimum. 


4.3 Metric Spaces 


The natural notion of distance on the set R” is the key to the existence 
of the standard topology. Luckily on many other sets similar notions of 
distance (called metrics) exist; any set that has a metric automatically has 
a topology. 
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Definition 4.3.1 A metric on a set X is a function 
p:xXxX~>R 


such that for all points x,y,z € X we have: 
1. p(x, y) > 0 and p(x, y) = 0 if and only if z =y. 


2. p(x, y) = ply, 2). 
8. (Triangle Inequality) 


p(z, z) < pls, y) + ply, z). 
The set X with its metric p is called a metric space and is denoted by (X, p). 


Fix a metric space (X, p). 


Definition 4.3.2 A set U in X is open if for all points a € U, there is 
some real number e > 0 such that 


{z :|z—a| < e} 
is contained in U. 


Proposition 4.3.1 The above definition for open set will define a Haus- 
dorff topological space on the metric space (X, p). 


The proof is similar to the corresponding proof for the standard topology 
on R”. In fact, most of the topological facts about R” can be quite eas- 
ily translated into corresponding topological facts about any metric space. 
Unfortunately, as will be seen in section five, not all natural topological 
spaces come from a metric. 

An example of a metric that is not just the standard one on R” is given 
in Chapter Thirteen, when a metric and its associated topology is used to 
define Hilbert spaces. 


4.4 Bases for Topologies 


Warning: This section uses the notion of countability. A set is countable 
if there is a one-to-one onto mapping from the set to the natural num- 
bers. More on this is in Chapter Ten. Note that the rational numbers are 
countable while the real numbers are uncountable. 

In linear algebra, the word basis means a list of vectors in a vector space 
that generates uniquely the entire vector space. In a topology, a basis will be 
a collection of open sets that generate the entire topology. More precisely: 
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Definition 4.4.1 Let X be a topological space. A collection of open sets 
forms a basis for the topology if every open set in X is the (possibly infinite) 
union of sets from the collection. 


For example, let (X, p) be a metric space. For each positive integer k 
and for each point p € X, set 


U(p,k) = {2 € X : (2, p) < 1} 


We can show that the collection of all possible U (p, k) forms a basis for the 
topology of the metric space. 

In practice, having a basis will allow us to reduce many topological 
calculations to calculating on sets in the basis. This will be more tractable 
if we can somehow limit the number of elements in a basis. This leads to 


Definition 4.4.2 A topological space is second countable if it has a basis 
with a countable number of elements. 


For example, R”, with the usual topology, is second countable. A count- 
able basis can be constructed as follows. For each positive integer k and 
each p € Q” (which means that each coordinate of the point p is a rational 
number), define 


U(p,k) = {2 ER" : |s —p] < $}. 


There are a countable number of such sets U (p, k) and they can be shown 
to form a basis. 

Most reasonable topological spaces are second countable. Here is an 
example of a metric space that is not second countable. It should and 
does have the feel of being a pathology. Let X be any uncountable set 
(you can, for example, let X be the real numbers). Define a metric on 
X by setting p(a,y) = 1 if x # y and p(z,x) = 0. It can be shown 
that this p defines a metric on X and thus defines a topology on X. This 
topology is weird, though. Each point x is itself an open set, since the 
open set {y € X : p(z,y) < 1/2} = z. By using the fact that there are an 
uncountable number of points in X, we can show that this metric space is 
not second countable. 

Of course, if we use the term “second countable”, there must be a mean- 
ing to “first countable”. A topological set is first countable if every point 
x € X has a countable neighborhood basis. For this to make sense, we 
need to know what a neighborhood basis is. A collection of open sets in X 
forms a neighborhood basis of some x € X if every open set containing x has 
in it an open set from the collection and if each open set in the collection 
contains the point x. We are just mentioning this definition for the sake of 
completeness. While we will later need the notion of second countable, we 
will not need in this book the idea of first countable. 


4.5. ZARISKI TOPOLOGY OF COMMUTATIVE RINGS 75 
4.5 Zariski Topology of Commutative Rings 


Warning: This section requires a basic knowledge of commutative ring 
theory. 

Though historically topology arose in the study of continuous functions 
on R”, a major reason why all mathematicians can speak the language of 
open, closed and compact sets is because there exists natural topologies 
on many diverse mathematical structures. This section looks at just one of 
these topologies. While this example (the Zariski topology for commutative 
rings) is important in algebraic geometry and algebraic number theory, 
there is no reason for the average mathematician to know it. It is given here 
simply to show how basic topological notions can be applied in a nonobvious 
way to an object besides R”. We will in fact see that the Zariski topology 
on the ring of polynomials is not Hausdorff and hence cannot come from a 
metric. 

We want to associate a topological space to any commutative ring R. 
Our topological space will be defined on the set of all prime ideals in the 
ring R, a set that will be denoted by Spec(R). Instead of first defining the 
open sets, we will start with what will be the closed sets. Let P be a prime 
ideal in R and hence a point in Spec R. Define closed sets to be 


Vp = {Q: Qis a prime ideal in R containing P}. 


Then define Spec R — Vp, where P is any prime ideal, to be an open set. 
The Zariski topology on Spec R is given by defining open sets to be the 
unions and finite intersections of all sets of the form Spec R — Vp. 

As will be seen in some of the examples, it is natural to call the points 
in Spec R corresponding to maximal ideals geometric points. 

Assume that the ring R has no zero divisors, meaning that if x - y = 0, 
then either x or y must be zero. Then the element 0 will generate a prime 
ideal, (0), contained in every other ideal. This ideal is called the generic 
ideal and is always a bit exceptional. 

Now for some examples. For the first, let the ring R be the integers Z. 
The only prime ideals in Z are of the form 


(p) = {kp : k € Z, p a prime number} 


and the zero ideal (0). Then Spec Z is the set of all prime numbers: 


23 5 7 11 13 17 19 23 29 


and the zero ideal (0). The open sets in this topology are the complements 
of a finite number of these ideals. 
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For our second example, let the ring R be the field of complex numbers 
C. The only two prime ideals are the zero ideal (0) and the whole field 
itself. Thus in some sense the space C is a single point. 

A more interesting example occurs by setting R = C[z], the ring of 
one-variable polynomials with complex coefficients. We will see that as a 
point set this space can be identified with the real plane R? (if we do not 
consider the generic ideal) but that the topology is far from the standard 
topology of R?. Key is that all one-variable polynomials can be factored 
into linear factors, by the Fundamental Theorem of Algebra; thus all prime 
ideals are multiples of linear polynomials. We denote the ideal of all of the 
multiples of a linear polynomial z — c as: 


(x —c) = {f(«)(w—c) : f(e) € Cla}, c € ©). 


Hence, to each complex number, c = a+bi with a,b € R, there corresponds 
a prime ideal (a — c) and thus Spec C[z] is another, more ring-theoretic 
description of the complex numbers. Geometrically, Spec C[z] is 


Cc 


b * (x: (atbi)) 





Note that while the zero ideal (0) is still a prime ideal in C[z], it does 
not correspond to any point in C; instead, it is lurking in the background. 
The open sets in this topology are the complements of a finite number of 
the prime ideals. But each prime ideal corresponds to a complex number. 
Since the complex numbers C can be viewed as the real plane R?, we have 
that an open set is the complement of a finite number of points in the real 
plane. While these open sets are also open in the standard topology on 
R?, they are far larger than any open disc in the plane. No little e-disc will 
be the complement of only a finite number of points and hence cannot be 
open in the Zariski topology. In fact, notice that the intersection of two of 
these Zariski open sets must intersect. This topology cannot be Hausdorff. 
Since all metric spaces are Hausdorff, this means that the Zariski topology 
cannot come from some metric. 

Now let R = C[z,y] be the ring of two-variable polynomials with com- 
plex coefficients. Besides the zero ideal (0), there are two types of prime 
ideals: the maximal ideals, each of which is generated by polynomials of 
the form x — c and y — d, where c and d are any two complex numbers 
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and nonmaximal prime ideals, each of which is generated by an irreducible 
polynomial f(z, y). 

Note that the maximal ideals correspond to points in the complex plane 
C x C, thus justifying the term ‘geometric point’. 


Cc ve 


Pay Sites is heat dee Care 
Ideal (x-cy-d)": 


x 





Since each copy of the complex numbers C is a real plane R?, C x C 
is R?xR? = R^. In the Zariski topology, open sets are the complements 
of the zero loci of polynomials. For example, if f(x,y) is an irreducible 
polynomial, then the set 


U = {(a,y) € C*: f(x,y) #0} 


is open. While Zariski sets will still be open in the standard topology on 
R4, the converse is most spectacularly false. Similar to the Zariski topology 
on C[z], no -ball will be open in the Zariski topology on C[z, y]. In fact, if 
U and V are two Zariski open sets that are non-empty, they must intersect. 
Thus this is also a non-Hausdorff space and hence cannot come from a 
metric space. 


4.6 Books 


Point set topology’s days of glory were the early twentieth century, a time 
when some of the world’s best mathematicians were concerned with the 
correct definitions for continuity, dimension and for a topological space. 
Most of these issues have long been settled. Today, point set topology is 
overwhelmingly a tool that all mathematicians need to know. 

At the undergraduate level, it is not uncommon for a math department 
to use their point set topology class as a place to introduce students to 
proofs. Under the influence of E. H. Moore (of the University of Chicago) 
and of his student R.L. Moore (of the University of Texas, who advised an 
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amazing number of Ph.D. students), many schools have taught topology 
under the Moore method. Using this approach, on the first day of class 
students are given a list of the definitions and theorems. On the second 
day people are asked who has proven Theorem One. If someone thinks they 
have a proof, they go to the board to present it to the class. Those who 
still want to think of a proof on their own leave the class for that part of 
the lecture. This is a powerful way to introduce students to proofs. On the 
other hand, not much material can be covered. At present, most people 
who teach using the Moore method modify it in various ways. 

Of course, this approach comes close to being absurd for people who are 
already mathematically mature and just need to be able to use the results. 
The texts of the fifties and sixties were by Kelley [72] and Dugundji [30]. 
Overwhelmingly the most popular current book is Munkres’ Topology: A 
First Course [88]. 

My own bias (a bias not shared by most) is that all the point set topology 
that most people need can be found in, for example, the chapter in Royden’s 
Real Analysis [95] on topology. 


4.7 Exercises 


1. The goal of this problem is to show that a topology on a set X can also 
be defined in terms of a collection of closed sets, as opposed to a collection 
of open sets. Let X be a set of points and let C = {Ca} be a collection of 
subsets of X. Suppose that 


e Any finite union of sets in the collection C must be another set in C. 
e Any intersection of sets in C must be another set in C. 
e The empty set ¢ and the whole space X must in the collection C. 


Call the sets in C closed and call a set U open if its complement X — U is 
closed. Show that this definition of open set will define a topology on the 
set X. 

2. Prove Proposition 4.2.1. 

3. Prove Theorem 4.2.2. 

4, Prove Theorem 4.2.3. 

5. Let V be the vector space of all functions 


f:[0,1J3R 


whose derivatives, including the one-sided derivatives at the endpoints, are 
continuous functions on the interval [0, 1]. Define 


\floo = sup |f(a)| 
x€[0,1] 
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for any function f € V. For each f € V and each ¢ > 0, define 
Us(e) = {9 E V : |f — glo < €}. 


a. Show that the set of all U;(e) is a basis for a topology on the set V. 
b. Show that there can be no number M such that for all f € V, 


d 
(SA leo < Mlfleo: 


In the language of functional analysis, this means that the derivative, 
viewed as a linear map, is not bounded on the space V. One of the main 
places where serious issues involving point set topology occur is in func- 
tional analysis, which is the study of vector spaces of various types of func- 
tions. The study of such space is important in trying to solve differential 
equations. 


Chapter 5 


Classical Stokes’ 
Theorems 











Basic Objects: Manifolds and boundaries 
Basic Maps: Vector-valued functions on manifolds 
Basic Goal: Function’s average over a boundary 

= Derivative’s average over interior 







Stokes’ Theorem, in all of its many manifestations, comes down to equating 
the average of a function on the boundary of some geometric object with 
the average of its derivative (in a suitable sense) on the interior of the 
object. Of course, a correct statement about averages must be put into the 
language of integrals. This theorem provides a deep link between topology 
(the part about boundaries) and analysis (integrals and derivatives). It 
is also critical for much of physics, as can be seen in both its historical 
development and in the fact that for most people their first introduction to 
Stokes’ Theorem is in a course on electricity and magnetism. 

The goal of Chapter Six is to prove Stokes’ Theorem for abstract man- 
ifolds (which are, in some sense, the abstract method for dealing with ge- 
ometric objects). As will be seen, to even state this theorem takes serious 
work in building up the necessary machinery. This chapter looks at some 
special cases of Stokes’ Theorem, special cases that were known long be- 
fore people realized that there is this one general underlying theorem. For 
example, we will see that the Fundamental Theorem of Calculus is a spe- 
cial case of Stokes’ Theorem (though to prove Stokes’ Theorem, you use 
the Fundamental Theorem of Calculus; thus logically Stokes’ Theorem does 
not imply the Fundamental Theorem of Calculus). It was in the 1800s that 
most of these special cases of Stokes’ Theorem were discovered, though, 
again, people did not know that each of these were special cases of one 
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general result. These special cases are important and useful enough that 
they are now standard topics in most multivariable calculus courses and 
introductory classes in electricity and magnetism. They are Green’s Theo- 
rem, the Divergence Theorem and Stokes’ Theorem. (This Stokes’ theorem 
is, though, a special case of the Stokes’ Theorem of the next chapter.) This 
chapter develops the needed mathematics for these special cases. We will 
state and sketch proofs for the Divergence Theorem and Stokes’ Theorem. 
Physical intuitions will be stressed. 

There is a great deal of overlap between the next chapter and this one. 
Mathematicians need to know both the concrete special cases of Stokes’ 
Theorem and the abstract version of Chapter Six. 


5.1 Preliminaries about Vector Calculus 


This is a long section setting up the basic definitions of vector calculus. We 
need to define vector fields, manifolds, path and surface integrals, diver- 
gence and curl. All of these notions are essential. Only then can we state 
the Divergence Theorem and Stokes’ Theorem, which are the goals of this 
chapter. 


5.1.1 Vector Fields 


Definition 5.1.1 A vector field on R” is a vector-valued function 


F:R” >R”. 
If z1,...,£n are coordinates for R”, then the vector field F will be described 
by m real-valued functions fpg : R” — R. as follows: 
fi(zi,. a9 (ln) 
F(z1,..., 2n) = : 
fm(21,+++,2n) 


A vector field is continuous if each real-valued function fp is continuous, 
differentiable if each real-valued fg is differentiable, etc. 

Intuitively, a vector field assigns to each point of R”? a vector. Any 
number of physical phenomenon can be captured in terms of vector fields. 
In fact, they are the natural language of fluid flow, electric fields, magnetic 
fields, gravitational fields, heat flow, traffic flow and much more. 

For example, let F : R? — R? be given by 


F(z,y) = (3,1). 
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Here f, (2, y) = 3 and fo(z,y) = 1. On R? this vector field can be pictured 
by drawing in a few sample vectors. 


ed 
a ae T 


\ 


A physical example of this vector field would be wind blowing in the direc- 
tion (3,1) with velocity 


length(3, 1) = V9 +1 = VTO. 


Now consider the vector field F(x, y) = (x,y). Then in pictures we have: 





This could represent water flowing out from the origin (0, 0). 


For our final example, let F(z, y) = (—y, x). In pictures we have: 
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which might be some type of whirlpool. 


5.1.2 Manifolds and Boundaries 


Curves and surfaces appear all about us. Both are examples of manifolds, 
which are basically just certain naturally occurring geometric objects. The 
intuitive idea of a manifold is that, for a k-dimensional manifold, each point 
is in a neighborhood that looks like a ball in R*. In the next chapter we 
give three different ways for defining a manifold. In this chapter, we will 
define manifolds via parametrizations. The following definition is making 
rigorous the idea that locally, near any point, a k-dimensional manifold 
looks like a ball in R*. 


Definition 5.1.2 A differentiable manifold M of dimension k in R” is a 
set of points in R” such that for any point p € M, there is a small open 
neighborhood U of p, a vector-valued differentiable function F : R* + R” 
and an open set V in RË with 

a) F(V) =UNM 

b) The Jacobian of F has rank k at every point in V, where the Jacobian 
of F is the n x k matriz 


of. of, 
Oni are Orn 
i Sja 
Ofn Ofn 
Oar `l Dak 
with z1,-.. £p a coordinate system for R¥. The function F is called the 


(local) parametrization of the manifold. 


Recall that the rank of a matrix is k if the matrix has an invertible k x k 
minor. (A minor is a submatrix of a matrix.) 
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A circle is a one-dimensional manifold, with a parametrization 
F:R! >R? 
given by 


F(t) = (cos(t), sin(t)). 


ATT (cos(t),sin(t)) 


-x O x 2 


t-axis 


Geometrically the parameter ¢ is the angle with the z-axis. Note that the 
Jacobian of F is (~,5%"). Since sine and cosine cannot simultaneously be 
zero, the Jacobian has rank 1. 


A cone in three-space can be parametrized by 


F(u,v) = (u,v, Vu? + v?). 


(u,v) —» (u,v, V U&V ) 


v en a, 
L 


xX 


This will be a two dimensional manifold (a surface) except at the vertex 
(0,0,0), for at this point the Jacobian fails to be well-defined, much less 
having rank two. Note that this agrees with the picture, where certainly 
the origin looks quite different than the other points. 

Again, other definitions are given in Chapter Six. 

Now to discuss what is the boundary of a manifold. This is needed 
since Stokes’ Theorem and its many manifestations state that the average 
of a function on the boundary of a manifold will equal the average of its 
derivative on the interior. 

Let M be a k-dimensional manifold in R”. 
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Definition 5.1.3 The closure of M, denoted M, is the set of all points x 
in R” such that there is a sequence of points (£n) in the manifold M with 


lim ty t: 
n—> oo 


The boundary of M, denoted OM, is: 
OM=M-M, 
Given a manifold with boundary, we call the nonboundary part the interior. 


All of this will become relatively straightforward with a few examples. 
Consider the map 


r:[-1,2] > R? 


where 


r(t) = (t,t?). 





The image under r of the open interval (—1,2) is a one-manifold (since 
the Jacobian is the 2 x 1 matrix (1,2¢), which always has rank one). The 
boundary consists of the two points r(—1) = (—1, 1) and r(2) = (2,4). 

Our next example is a two-manifold having a boundary consisting of a 
circle. Let 


r: {(z,y) € R? : 1? +y? < 1} > R? 


be defined by 
r(z,y) = (z,y,2" +y”). 


The image of r is a bowl in space sitting over the unit disc in the plane: 
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Now the image under r of the open disc {(z,y) € R?: 2? + y? < ll isa 
two-manifold (since the Jacobian is 


1 0 2z 
G 1 ; 
which has rank two at all points). The boundary is the image of the bound- 
ary of the disc and hence the image of the circle {(z,y) € R? : 2?+y? = 1}. 
In this case, as can be seen by the picture, the boundary is itself a circle 
living on the plane z = 1 in space. 

Another example is the unit circle in the plane. We saw that this is a 
one-manifold. There are no boundary points, though. On the other hand, 
the unit circle is itself the boundary of a two-manifold, namely the unit 
disc in the plane. In a similar fashion, the unit sphere in Rê is a two- 
manifold, with no boundary, that is itself the boundary of the unit ball, a 
three-manifold. (It is not chance that in these two cases that the boundary 
of the boundary is the empty set.) 

We will frequently call a manifold with boundary simply a manifold. 
We will also usually be making the assumption that the boundary of an 
n-dimensional manifold will either be empty (in which case the manifold 
has no boundary) or is itself an (n — 1)-dimensional manifold. 


5.1.3 Path Integrals 


Now that we have a sharp definition for manifolds, we want to do calculus 
on them. We start with integrating vector fields along curves. This process 
is called a path integral or sometimes, misleadingly, a line integral. 

A curve or path C in R” is defined to be a one-manifold with boundary. 
Thus all curves are defined by maps F : [a,b] > R”, given by 
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fit) 
F(t) = : 
fn{t) 


These maps are frequently written as 


Ly (t) 


tn(t) 
We will require each component function f; : R — R to be differentiable. 


Definition 5.1.4 Let f(a,...,2n) be a real-valued function defined on 
R”. The path integral of the function f along the curve C is 


[fe = [FErz 


[ OR sa ( (E4. TOE 


II 


Note that 


b 
i Maw; aa (Sty + 4 Gy) a 


while looking quite messy, is an integral of the single variable t. 


Theorem 5.1.1 Let a curve C in R” be described by two different parametriza- 
tions 


F : [a,b] > R” 
and 
G : [c,d] > R”, 
a(t) yi (1) 
with F(t) = : and G(u) = : 
£n (t) Yn(u) 
The path integral fo f ds is independent of parametrization chosen, i.e., 
b 
dz dz 
[ sean), 20) (Ga? +--+ GBP at 


d 
i F(yi(u),- nO ea ere + (Stays du. 
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While we will do an example in a moment, the proof uses critically and is 
an exercise in the chain rule. In fact, the path integral was defined with 
the awkward term 
dri, dz, 
dt oo 
precisely in order to make the path integral independent of parametrization. 
This is why f : f(xi(t),...,@n(t)) dé is an incorrect definition for the path 
integral. 

The symbol “ds” represents the infinitesimal arc length on the curve 
C in R”. In pictures, for R?, consider the following. 


ds = ae ,) at 


AS 
AXo 
BX; 


AS Re Vf (AX,)2+ (A Xa)2 


With As denoting the change in position along the curve C, we have by 
the Pythagorean Theorem 


As & (Aa)? + (Aare)? 
sf Azı 2 Ato 2 
— ( At w+ At ) j mee 
Then in the limit as At > 0, we have, at least formally, 
Fe da, dae 
s= EERE p)ar 


Thus the correct implementation of the Pythagorean Theorem will also 
force on us the term ds = \/ (2)? + + (42)? dt in the definition of 
the path integral. 

Now for an example, in order to check our working knowledge of the 
definitions and also to see how the ds term is needed to make path integrals 
independent of parametrizations. Consider the straight line segment in the 
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plane from (0,0) to (1,2). We will parametrize this line segment in two 
different ways, and then compute the path integral of the function 


f(x,y) =2” + 3y 


using each of the parametrizations. 
First, define 
F : [0,1] > R? 
by 
F(t) = (t, 2t). 


Thus we have z(t) = t and y(t) = 2t. Denote this line segment by C. 
Then 


[tenes = | eot E at 


1 
= [+ onvat 
0 
i 1 241 
= v55 lo + 3t lo) 


= G +3) 
10 


z y5 


Now parametrize the segment C by: 
G : [0,2] — C 


where 


at) = (Ż 


pd) 
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Here we have x(t) = $ and y(t) = t. Then 


I (x(t)? ETONE 24 ($) dt 


[E+] Tya 


3 
A ae 


v5, 8 


= 9 (gt 


10 
= Zvi, 


[ temas 


as desired. 


5.1.4 Surface Integrals 


Now to integrate along surfaces. A surface in R is a two-manifold with 
boundary. For the sake of simplicity, we will restrict our attention to those 
surfaces which. are the image of a map 


r:D>R3, 


given by 

r(u,v) = (z(u,v), y(u, v), z(u, v)), 
where x,y,z are coordinates for RÌ and u,v are coordinates for R?. Here 
D is a domain in the plane, which means that there is an open set U in R? 
whose closure is D. (If you think of U as an open disc and D as a closed 
disc, you usually will not go wrong.) 


Definition 5.1.5 Let f(x,y,z) be a function on R’. Then the integral of 
f(x,y,z) along the surface S is 


[ [tem yas = ff Fetu,r), yu»), 2(0,0)) 5 


Here | oz x 2r] denotes the length of the cross product (which in a omen 
we will show to be the length of a certain normal vector) of the vectors 22 
and 2r and is hence the determinant of 


Or Or 
Ju * Do 


Do dudv. 








2 ; i 
x x a = = an /Ou dy/du ðz/ðu |. 
N S ðx/ðv ðy/ðv Oz/dv 
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Thus the infinitesimal area dS is: 


length of (= x a dudu = = 
dy Oz, ðzðy, Ov Oz, ðrðz, ðzðy, Ou dy 
(He) - E.R - Ge ‘Saou oven ies 








In analogy with arc length, a surface integral is independent of parametriza- 
tion: 


Theorem 5.1.2 The integral [ Ís f(z,y,z) dS is independent of the parametri 
tion of the surface S. 


Again, the chain rule is a critical part of the proof. 

Note that if this theorem were not true, we would define the surface 
integral (in particular the infinitesimal area) differently. 

We now show how the vector field 


ðr Or 
ðu Ov 


is actually a normal to the surface. With the map r: R? > R® given by 
r(u,v) = (z(u,v), y(u, v), z(u, v)), recall that the Jacobian of r is 


ðxr/ðu Ox/dv 
Oy/Ou Əðy/ðv |. 
ðz/ðu Oz/dv 


But as we saw in Chapter Three, the Jacobian maps tangent vectors to 
tangent vectors. Thus the two vectors 


(a2, by Bz, 
Ou’ Ou’ Ou 


and 


Ox Oy Oz 


du’ Ov’ du 
are both tangent vectors to the surface S. Hence their cross product must 
be a normal (perpendicular) vector n. Thus we can interpret the surface 


integral as 
| [res= | #-tajauav 
S 


with dS =(length of the normal vector 8" x 2) dudv. 
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5.1.5 The Gradient 


The gradient of a function can be viewed as a method for differentiating 
functions. 
Definition 5.1.6 The gradient of a real-valued function f(21,...,2n) is 
_ Of of 
vi = (g) 


Thus 
y : (Functions) — (Vector fields). 


For example, if f(x,y,z) = x? + 2xy + 3xz, we have 
V(f) = (3x? + 2y + 3z, 22, 32). 


It can be shown that if at all points on M = (f(z1,...,2%n) = 0) where 
Vi #0, the gradient vf is a normal vector to M. 


5.1.6 The Divergence 


The divergence of a vector field can be viewed as a reasonable way to 
differentiate a vector field. (In the next section we will see that the curl of 
a vector field is another way.) Let F(x,y,z) : R? 3 R® be a vector field 
given by three functions as follows: 


F(z,y,2) = (fiz, y,2), fola,y, 2), fa(z, y, 2). 
Definition 5.1.7 The divergence of F(z, y,z) is 


py 2h 2h êh 
dio(F) = ae + Be oe: 
Thus 
div : (Vector fields) — (Functions). 


The Divergence Theorem will tell us that the divergence measures how 
much the vector field is spreading out at a point. 
For example, let F(x, y, z) = (z,y”, 0). Then 
dx | Ay”) _ 20) 


div(F) = 2 + dy toas Cra 





E you sketch out this vector field, you do indeed see that the larger the y 
value, the more spread out the vector field becomes. 
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5.1.7 The Curl 


The curl of a vector field is another way in which we can extend the idea of 
differentiation to vector fields. Stokes’ Theorem will show us that the curl 
of a vector field measures how much the vector field is twirling or whirling 
or curling about. The actual definition is: 


Definition 5.1.8 The curl of a vector field F(z, y, z) is 


i j k 
cur(F) = det| 2 & È 
fi fe fs 

Ofs  Of2 _,Ofs  Ofi, fz  Ofi 


= May On? OR os en eye 


Note that 
curl : (Vector fields) —> (Vector fields). 


Now to look at an example and see that the curl is indeed measuring 
some sort of twirling. Earlier we saw that the vector field F(z,y,z) = 
(—y, 2,0) looks like a whirlpool. Its curl is: 

k 
2 
02 E 
0 


which reflects that the whirlpool action is in the xzy-plane, perpendicular 
to the z-axis. 

We will see in the statement of Stokes’ Theorem that intuitively the 
length of the curl(F) indeed measures how much the vector field is twirling 
about while the vector curl(F) points in the direction normal to the twirling. 


i 
curl(F) = det ( Zz 
=y 


8 gos. 


= (0, 0, 2), 


5.1.8 Orientability 


We also require our manifolds to be orientable. For a surface, orientability 
means that we can choose a normal vector field on the surface that varies 
continuously and never vanishes. For a curve, orientability means that we 
can choose a unit tangent vector, at each point, that varies continuously. 

The standard example of a nonorientable surface is the M6bius strip, 
obtained by putting a half twist in a strip of paper and then attaching the 
ends. 
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For an orientable manifold, there are always two choices of orientation, de- 
pending on which direction is chosen for the normal or the tangent. Further 
an oriented surface S with boundary curve 0S will induce an orientation 
on OS, as will a 3-dimensional region induce an orientation on its bound- 
ary surface. If you happen to choose the wrong induced orientation for a 
boundary, the various versions of Stokes’ Theorems will be off merely by a 
factor of (-1). Do not panic if you found the last few paragraphs vague. 
They were, deliberately so. To actually rigorously define orientation takes 
a little work. In first approaching the subject, it is best to concentrate on 
the basic examples and only then worry about the correct sign coming from 
the induced orientations. Rigorous definitions for orientability are given in 
the next chapter. 


5.2 The Divergence Theorem and Stokes’ 
Theorem 


(For technical convenience, we will assume for the rest of this chapter that 
all functions, including those that make up vector fields, have as many 
derivatives as needed.) 

The whole goal of this chapter is to emphasize that there must always be 
a deep link between the values of a function on the boundary of a manifold 
with the values of its derivative (suitably defined) on the interior of the 
manifold. This link is already present in 


Theorem 5.2.1 (The Fundamental Theorem of Calculus) Let 
f: [a,b] >R 
be a a real-valued differentiable function on the interval [a,b]. Then 


b 
TEE f Éan, 


Here the derivative af is integrated over the interval 


la,b] = {£r ER:a<zx<b}, 
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which has as its boundary the points (a) and (b). The orientation on the 
boundary will be 6 and —a, or 


Ola, b] = b—a. 


Then the Fundamental Theorem of Calculus can be interpreted as stating 
that the value of f(x) on the boundary is equal to the average (the integral) 
of the derivative over the interior. 

One possible approach to generalizing the Fundamental Theorem is to 
replace the one-dimensional interval [a,b] with something higher dimen- 
sional and replace the one variable function f with either a function of 
more than one variable or (less obviously) by a vector field. The correct 
generalizations will of course be determined by what can be proven. 

In the divergence theorem, the interval becomes a three-dimensional 
manifold, whose boundary is a surface, and the function f becomes a vector 
field. The derivative of f will here be the divergence. More precisely: 


Theorem 5.2.2 (The Divergence Theorem) Jn R?, let M be a three- 
dimensional manifold with boundary OM a compact manifold of dimension 
two. Let F(x,y,z) denote a vector field on R? and let n(z,y,z) denote a 
unit normal vector field to the boundary surface OM. Then 


/ f „Fnds= / I [ (ior) dadydz. 


We will sketch a proof in section 5.5. 

On the left hand side we have an integral of the vector field F over 
the boundary. On the right hand side we have an integral of the function 
div(F) (which involves derivatives of the vector field) over the interior. 

In Stokes’ Theorem, the interval becomes a surface, so that the bound- 
ary is a curve, and the function again becomes a vector field. The role of 
the derivative though will now be played by the curl of the vector field. 


Theorem 5.2.3 (Stokes’ Theorem) Let M be a surface in R? with com- 
pact boundary curve OM. Let n(x,y,z) be the unit normal vector field to 


M and let T(x,y,z) denote the induced unit tangent vector to the curve 
OM. If F(x,y,z) is any vector field, then 


i r-Tas= | f curl(F) -n dS. 
aM M 


As with the Divergence Theorem, a sketch of the proof will be given later 
in this chapter. 

Again, on the left hand side we have an integral involving a vector field 
F on the boundary while on the right hand side we have an integral on the 
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interior involving the curl of F (which is in terms of the various derivatives 
of F). 

Although both the Divergence Theorem and Stokes’ Theorem were 
proven independently, their similarity is more than a mere analogy; both 
are special cases, as is the Fundamental Theorem of Calculus, of one very 
general theorem, which is the goal of the next chapter. The proofs of each 
are also quite similar. There are in fact two basic methods for proving these 
types of theorems. The first is to reduce to the Fundamental Theorem of 
Calculus, f(b) — f(a) = f? dr. This method will be illustrated in our 
sketch of the Divergence Theorem. 

The second method involves two steps. Step one is to show that given 
two regions R, and Rz that share a common boundary, we have 


| function + i function = f function. 
ORi ORe2 Ə(RıUR2) 


Step two is to show that the theorem is true on infinitesimally small regions. 
To prove the actual theorem by this approach, simply divide the original 
region into infinitely many infinitesimally small regions, apply step two and 
then step one. We take this approach in our sketch of Stokes’ Theorem. 
Again, all of these theorems are really the same. In fact, to most mathe- 
maticians, these theorems usually go by the single name “Stokes’ Theorem”. 


5.3 A Physical Interpretation of the Diver- 
gence Theorem 


The goal of this section is to give a physical meaning to the Divergence 
Theorem, which was, in part, historically how the theorem was discovered. 
We will see that the Divergence Theorem states that the flux of a vector 
field through a surface is precisely equal to the sum of the divergences of 
each point of the interior. Of course, we need to give some definitions to 
these terms. 


Definition 5.3.1 Let S be a surface in R? with unit normal vector field 
n(z,y,z). Then the flux of a vector field F(x, y, z) through the surface S is 


| [Fes 


Intuitively we want the flux to measure how much of the vector field F 
pushes through the surface S. 

Imagine a stream of water flowing along. The tangent vector of the 
direction of the water at each point defines a vector field F (z, y, z). Suppose 
the vector field F is: 
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_ — ol 
ee 
n o o oM 


—— —— rl 


Place into the stream an infinitely thin sheet of rubber, let us say. We want 
the flux to measure how hard it is to hold this sheet in place against the 
flow of the water. Here are three possibilities: 





— — | — — — _ — —> — ~~ —> — 

— [oe — — > — — —> — 

a [| — — — — a — 

ee — — > —p\ — — — — 
A B C 


In case A, the water is hitting the rubber sheet head on, making it quite 
difficult to hold in place. In case C, no effort is needed to hold the sheet 
still, as the water just flows on by. The effort needed to keep the sheet 
still in case B is seen to be roughly halfway between effort needed in cases 
A and C. The key to somehow quantifying these differences of flux is to 
measure the angle between the vector field F of the stream and the normal 
vector field n to the membrane. Clearly, the dot product F -n works. Thus 


using that flux is defined by 
f l F- nds, 
S 


the flux through surface A is greater than the flux through surface B which 
in turn is greater than the flux through surface C, which has flux equal to 
0. 

The Divergence Theorem states that the flux of a vector field through 
a boundary surface is exactly equal to the sum (integral) of the divergence 
of the vector field in the interior. In some sense the divergence must be an 
infinitesimal measure of the flux of a vector field. 


5.4 A Physical Interpretation of Stokes’ The- 
orem 


Here we discuss the notion of the circulation of a vector field with respect 
to a curve. We will give the definition, then discuss what it means. 


Definition 5.4.1 Let C be a smooth curve in RÌ with unit tangent vector 
field T(x, y,z). The circulation of a vector field F(x, y,z) along the curve 


C is 
fE Tas 
C 
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Let F be a vector field representing a flowing stream of water, such as: 


— or — ———— 
eee 
—— —— —— —— 


— ——e — — 


Put a thin wire (a curve C) into this stream with a small bead attached to 
it, with the bead free to move up and down the wire. 


—_—_ |. ee 

— |. e —— 

— |—— » » —— 
a b 

p —— fe -> -r — 

@ -- nD 

—— =k ee aol —— o ce al 
c d 


In case a, the water will not move the ball at all. In case b the ball will be 
pushed along the curve while in case c the water will move the ball the most 
quickly. In case d, not only will the ball not want to move along the curve 
C, effort is needed to even move the ball at all. These qualitative judgments 
are captured quantitatively in the above definition for circulation, since the 
dot product F - T measures at each point how much of the vector field F ` 
is pointing in the direction of the tangent T and hence how much of F is 
pointing in the direction of the curve. 

In short, circulation measures how much of the vector field flows in the 
direction of the curve C. In physics, the vector field is frequently the force, 
in which case the circulation is a measurement, of work. 

Thus Stokes’ Theorem is stating that the circulation of a vector field 
along a curve M which bounds a surface M is precisely equal to the normal 
component of the vector field curl(F) in the interior. This is why the term 
‘curl’ is used, as it measures the infinitesimal tendency of a vector field to 
have circulation, or in other words, it provides an infinitesimal measure of 
the “whirlpoolness” of the vector field. 


5.5 Sketch of a Proof of the Divergence The- 
orem 


This will only be a sketch, as we will be making a number of simplifying 
assumptions. First, assume that our three-dimensional manifold M (a solid) 
is simple, meaning that any line parallel to the x—axis, y—axis or z—axis 
can only intersect M in a connected line segment or a point. Thus 
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is simple while 





is not. 
Denote the components of the vector field by 


F(z,y,2) (fila, y, 2), fala, y, 2), fa(2, y, 2) 
(fi, fe, fs). 


On the boundary surface 0M, denote the unit normal vector field by: 


It 


n(z, Y, z) = (nı (z, Y, z), nolz, Y, z), n3 (x, Y, z)) 
(ni, n2, ng). 


lI 


We want to show that 
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J [ Bnase ff ff aivoracayae. 


In other words, we want 


T (fini + fonz + fang)dS = gee (e+ oh 4 oh) dadydz. 


If we can show 
II finidS TTE -1 dzdydz, 
ƏM Oz 
1 fonzdS iis Gedadyde 
aM 
| ee Ar api 


/ f fgngdS 
aM 
we will be done. 
We will just sketch the proof of the last equation 
f f falz, y, z)ng(£, y, z)dS = f E a p, andud2, 
aM 


since the other two equalities will hold for similar reasons. 

The function n3(x,y,2z) is the z-component of the normal vector field 
n(z,y,z). By the assumption that M is simple, we can split the bound- 
ary component OM into three connected pieces: {9M yop, where ng > 0, 


{OM} side, where nz = 0 and {OM }oottom, where n3 < 0. 
For example, if 0M is 
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then 





Eg (3M) bottom 





x 


Then we can sptit the boundary surface integrat into three parts: 


If fsn3dS = TI faneas + ff fgn3dS 
OM OMtop OMside 
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+f f fangdS 

OMpoottom 

= [f  tmas+f f tamas, 
3 Miop OMbottom 


since ng, the normal component in the z direction, will be zero on OMsiae. 
Further, again by the assumption of simplicity, there is a region R in 
the xy-plane such that {3M }top is the image of a function 


(x,y) > (x,y, t(x, y)) 


Z 





and {0M }bottom is the image of a function 


(x,y) > (2, y, (2, y)). 


> (xy, bOsy) 





Then 


I inis- f | Fn I | fangdS 
ƏM 3 Miop OMoottom 
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J i fa(z,y, t(@,y))dady + 


f [Hente y))dady 


f I (ale, y tle) - fale, y, ble, y))jdzdy, 


where the minus sign in front of the last term coming from the fact that 
the normal to ðMbottom points downward. But this is just 


z,y) 
aa 7 andyde, 
b(z,y) 


by the Fundamental Theorem of Calculus. This, in turn, is equal to 


da Cede, 
M Oz 


which is what we wanted to show. 

To prove the full result, we would need to take any solid M and show 
that we can split M into simple parts and then that if the Divergence 
Theorem is true on each simple part, it is true on the original M. While 
not intuitively difficult, this is nontrivial to prove and involves some subtle 
questions of convergence. 


5.6 Sketch of a Proof for Stokes’ Theorem 


Let M be a surface with boundary curve 0M. 


oM 


We break the proof of Stokes’ Theorem into two steps. First, given two 
rectangles Rı and Rə that share a common side, we want 


f F- Tas + f F-Tds= | F - Tds, 
ƏRı OR2 ORiURe 


where T is the unit tangent vector. 
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mf a 


£ 


Second, we need to show that Stokes’ Theorem is true on infinitesimally 
small rectangles. 

The proof of the first is that for the common side £ of the two rectangles, 
the orientations are in opposite directions. This forces the value of the dot 
product (F'- T) along £ as a side of the rectangle Rı to have opposite sign 
of the value of (F - T) along @ as a side of the other rectangle Ro. Thus 


| FT ds =— f F-T ds. 
COR, £COR2 


Since the boundary of the union of the two rectangles R, U Re does not 
contain the side £, we have 


l F-Tdas+ | FT ds= | F.T ds. 
OR, Ə Rz @R,URe 


Before proving that Stokes’ Theorem is true on infinitesimally small 
rectangles, assume for a moment that we already know this to be true. 
Split the surface M into (infinitely many) small rectangles. 


Then 


| ie curl(F)-ndS 


curl(F)-ndS 
small rectangles 


5 ii F-T ds, 
aceach rectangle) 


since we are assuming that Stokes’ Theorem is true on infinitesimally small 
rectangles. But by the first step, the above sum will equal to the single 
integral over the boundary of the union of the small rectangles 


I F-T ds, 
aM 
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which gives us Stokes’ Theorem. Hence all we need to show is that Stokes’ 
Theorem is true for infinitesimally small rectangles. 

Before showing this, note that this argument is nonrigorous, as the whole 
sum is over infinitely many small rectangles, and thus subtle convergence 
questions would need to be solved. We pass over this in silence. 

Now to sketch why Stokes’ Theorem is true for infinitesimally small 
rectangles. This will also contain the justification for why the definition of 
the curl of a vector field is what it is. 

By a change of coordinates, we can assume that our small rectangle R 
lies in the wy-plane with one vertex being the origin (0,0). 





Its unit normal vector will be n = (0,0, 1). 
If the vector field is F(z, y,z) = (f1, fo, fa), we have 


_ Ofe Z Of: 
curl(F)-n= an Oy 
We want to show that: 
Of2 Of =f 
( An By )dady = ar T ds, 


where T is the unit tangent vector to the boundary rectangle OR and dz dy 
is the infinitesimal area for the rectangle R. 


Now to calculate f. əR E: T ds. 
The four sides of the rectangle OR have the following parametrizations. 
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Side Parametrization Integral 
I: s(t) =(tAz,0),0<t<1 fo fitz, 0)Aadt 
II: s(t) = (Az,tAy),0<t<1 fo fo(Aa, tay) Aydt 


IIIT: s(t) =(Ae—tAz,Ay),0<t<1 f > —f(Ae—tAz, Ay)Agat 


IV: s(t) = (0,Ay—tAy),0<t<1 So —f20, Ay — thy) Aydt 


It is always the case, for any function f(t), that 


[ fdt = T fA- t)dt 


by changing the variable ¢ to 1 — t. Thus the integrals for sides III and 
IV can be replaced by f) —fi(tAx, Ay)Az dt and {> —fo(0,tAy)Ay dt. 
Then 


F-T ds 
aR 


fE- Tas+ F-Tdst+ FT ds+ / F.T ds 
I II III IV 


1 
I (REAK OAT RATAA 
0 
—fi(tAa, Ay)Axz — fo(0, tAy)Ay)dt 


1 
i (fo(Az,tAy) — fo(0,tAy)) Aydt 


1 
-f (fi(tAx, Ay) — fi (tAx,0))Aaxdt 
0 


i fo(Az, tAy) = f2(0, tAy) 
0 Ax 


_ filtAa, Ay) — fi(tAz,y) 
Ay 


l (Sa _ a jadidi 
as Az, Ay - 0. But this last a i be 


(Oh _ oh 
Oz Oy 


)AvAydt, 


which converges to 


)dady 
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which is what we wanted. 

Again, letting Az, Ay — 0 is a nonrigorous step. Also, the whole 
nonchalant way in which we changed coordinates to put our rectangle into 
the zy-plane would have to be justified in a rigorous proof. 


5.7 Books 


Most calculus books have sections near the end on the multivariable calculus 
covered in this chapter. A long time popular choice is Thomas and Finney’s 
text [36]. Another good source is Stewart’s Calculus [108]. 

Questions in physics, especially in electricity and magnetism, were the 
main historical motivation for the development of the mathematics in this 
chapter. There are physical “proofs” of the Divergence Theorem and Stokes’ 
Theorem. Good sources are in Halliday and Resnick’s text in physics [51] 
and in Feynmann’s Lectures in Physics [35]. 


5.8 Exercises 


1. Extend the proof of the Divergence Theorem, given in this chapter for 
simple regions, to the region: 





¥xX 


2. Let D be the disc of radius r, with boundary circle 0D, given by the 
equations: 
D = {(z,y,0): 27 +y? <r}. 


For the vector field 


F(x,y, z) = (£ +y +z, 3x + 2y+ 42, 5x — 3y + z), 
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find the path integral f əp E : T ds, where T is the unit tangent vector of 
the circle OD. 
3. Consider the vector field 


F(z,y,2) = (x, 2y,5z). 


Find the surface integral f fay F'n dS, where the surface 0M is the 
boundary of the ball 
M = {(a,y,2z): 27 +y?> +2? <r} 


of radius r centered at the origin and n is the unit normal vector. 
4. Let S be the surface that is the image of the map 
r: R? > R? 
given by 
r(u,v) = (x(u, v), y(u, v), z(u, v)). 
Considering the image of the line v = constant, justify to yourself that 


(ou Ou 8z, 
Ou’ Ou’ Ou 

is a tangent vector to S. 

5. Green’s Theorem is: 


Theorem 5.8.1 (Green’s Theorem) Leto be a simple loop in C andQ 
its interior. If P(a,y) and Q(a,y) are two real-valued differentiable func- 


tions, then 
[Patoa [le ce an) ) da dy. 


By putting the region Q into the plane z = 0 and ake our vector field 
be (P(z,y), Q(x, y),0), show that Green’s Theorem follows from Stokes’ 
Theorem. 


Chapter 6 


Differential Forms and 
Stokes’ Theorem 


Basic Objects: Differential Forms and Manifolds 
Basic Goal: Stokes’ Theorem 


In the last chapter we saw various theorems, all of which related the values 
of a function on the boundary of a geometric object with the values of the 
function’s derivative on the interior. The goal of this chapter is to show 
that there is a single theorem (Stokes’ Theorem) underlying all of these 
results. Unfortunately, a lot of machinery is needed before we can even state 
this grand underlying theorem. Since we are talking about integrals and 
derivatives, we have to develop the techniques that will allow us to integrate 
on k-dimensional spaces. This will lead to differential forms, which are the 
objects on manifolds that can be integrated. The exterior derivative is the 
technique for differentiating these forms. Since integration is involved, we 
will have to talk about calculating volumes. This is done in section one. 
Section two defines differential forms. Section three links differential forms 
with the vector fields, gradients, curls and divergences from last chapter. 
Section four gives the definition of a manifold (actually, three different 
methods for defining manifolds are given). Section five concentrates on 
what it means for a manifold to be orientable. In section six, we define 
how to integrate a differential form along a manifold, allowing us finally in 
section seven to state and to sketch a proof of Stokes’ Theorem. 
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6.1 Volumes of Parallelepipeds 


In this chapter, we are ultimately interested in understanding integration 
on manifolds (which we have yet to define). This section, though, is pure 
linear algebra, but linear algebra that is crucial for the rest of the chapter. 

The problem is the following: In R”, suppose we are given k vectors 
Vi,---,Vz- These k vectors will define a parallelepiped in R”. The question 
is how to compute the volume of this parallelepiped. For example, consider 
the two vectors 


1 3 
vi =| 2] andvg= | 2 
3 1 


The parallelepiped that these two vectors span is a parallelogram in R?. 
We want a formula to calculate the area of this parallelogram. (Note: 
the true three dimensional volume of this flat parallelogram is zero, in the 
same way that the length of a point is zero and that the area of a line is 
zero; we are here trying to measure the two-dimensional “volume” of this 
parallelogram.) 

We already know the answer in two special cases. For a single vector 


ay) 


an 


in R”, the parallelepiped is the single vector v. Here by “volume” we mean 
the length of this vector, which is, by the Pythagorean theorem, 


Dido? 
ai +: + až. 


The other case is when we are given n vectors in R”. Suppose the n vectors 


are 
a11 Qin 


v= : peer Vn = 
an1 ünn 


Here we know that the volume of the resulting parallelepiped is 


Q@1 >t Gin 
det : ; 
Ani ° Unn 


following from one of the definitions of the determinant given in Chapter 
One. Our eventual formula will yield both of these results. 
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We will first give the formula and then discuss why it is reasonable. 
Write the & vectors v,,...,Vv4 as column vectors. Set 


A= (vj,.--5VE), 
ak xn matrix. We denote the transpose of A by AT, the n x k matrix 


T 
Vi 


AT =] : 
T 
Vk 
where each vT is the writing of the vector v; as a row vector. Then 


Theorem 6.1.1 The volume of the parallelepiped spanned by the vectors 


Viz- -Vk OS 
4/det(AT A). 


Before sketching a proof, let us look at some examples. Consider the single 


vector 
ay 
v= : f 
Qn 


Here the matrix A is just v itself. Then 


4/det(AT A) = ,/det(v?v) 


ai 
= iasan ( : ) 


= 4/det(a? +--- +2) 


= az +---+a?2, 





the length of the vector v. 
Now consider the case of n vectors v1,...,Vn. Then the matrix A is 
nxn. We will use that det(A) = det(A’). Then 


det( ATA) = 4/det( AT) det(A) 


= det(A)? 


= |det(A)], 
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as desired. 
Now to see why in general ,/det(A? A) must be the volume. We need 
a preliminary lemma that yields a more intrinsic, geometric approach to 
det(AT A). 


Lemma 6.1.1 For the matriz 


A = (Vis. Vk) 
we have that 
|v]? Vi- v2 «ee Wie Vis 
ATA=| ee 
Vk’ Vi VerVa [vk]? 


where v;i:v; denotes the dot product of vectors v; and v; and |v;i| = Vi Vi 
denotes the length of the vector v;. 


The proof of this lemma is just looking at 


T 
vi 


ATA = > (Vinee Ve) 

Notice that if we apply any linear transformation on R” that preserves 
angles and lengths (in other words, if we apply a rotation to R”), the 
numbers |v,| and v;-v; do not change. (The set of all linear transformations 
of R” that preserve angles and lengths form a group that is called the 
orthogonal group and denoted by O(n).) This will allow us to reduce the 
problem to the finding of the volume of a parallelepiped in R*. 

Sketch of Proof of Theorem: We know that 








val? Viva -.. Vie VE 
det(ATA) = | det : : : 
Vk Vi Verve [val 


We will show that this must be the volume. Recall the standard basis for 
N. 


1 0 0 
0 1 0 
e€ = 7&2 = = ? sen = ; 
0 0 1 


We can find a rotation of R” that both preserves lengths and angles 
and more importantly, rotates our vectors v,..., vx so that they lie in the 
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span of the first & standard vectors e;,...,e,. (To rigorously show this 
takes some work, but it is geometrically reasonable.) After this rotation, 
the last n — k entries for each vector v; are zero. Thus we can view our 
parallelepiped as being formed from k vectors in R*. But we already know 
how to compute this; it is 








\vi|? Vi+Va «ee Vi VE 
det : : : : 
Vk VI Vk V2 -vkl 


We are done. O 


6.2 Differential Forms and the Exterior 
Derivative 


This will be a long and, at times, technical section. We will initially define 
elementary k-forms on R”, for which there is still clear geometric meaning. 
We will then use these elementary k-forms to generate general k-forms. 
Finally, and for now no doubt the most unintuitive part, we will give the 
definition for the exterior derivative, a device that will map k-forms to 
(k+ 1)-forms and will eventually be seen to be a derivative-type operation. 
In the next section we will see that the gradient, the divergence and the curl 
of the last chapter can be interpreted in terms of the exterior derivative. 


6.2.1 Elementary k-forms 


We start with trying to understand elementary 2-forms in R°. Label the 
coordinate axis for R° as 21, 22,23. There will be three elementary 2-forms, 
which will be denoted by da; Adze, dz; Adz3 and dzzAdz3. We must now 
determine what these symbols mean. (We will define 1-forms in a moment.) 

In words, dz; A dz will measure the signed area of the projection onto 
the x122-plane of any parallelepiped in R°, dz; A dzz will measure the 
signed area of the projection onto the z,x3-plane of any parallelepiped in 
R? and dzz A dzz will measure the signed area of the projection onto the 
X223-plane of any parallelepiped in R3. 

By looking at an example, we will see how to actually compute with 
these 2-forms. Consider two vectors in R?, labelled 


1 3 
vi = | 2] andve = | 2 
3 1 
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These vectors span a parallelepiped P in R®. Consider the projection map 
m : R3 — R? of R? to the 2122 plane. Thus 


m(21, 22,03) = (21,22). 


We define dz; A dx, acting on the parallelepiped P to be the area of 7(P). 


Note that 
faye (3) and (va) = a) 


Then 7(P) is the parallelogram: 





and the signed area is 


da, Ada2(P) = det(m(v1),7(v2)) 
1 
= det G 3) 
= 4 


In general, given a 3 x 2 matrix 


Qil 2 
A= | an az}, 
a31 432 


its two columns will define a parallelepiped. Then da, A dz2 of this paral- 
lelepiped will be 


day A dag(A) = det ( vu i) A 


Q21 Q22 


In the same way, dz, A da3 will measure the area of the projection of a 
parallelepiped onto the #,%3-plane. Then 


da, A da3(A) = det { “2 ae 
ee 3( ) z & a32 
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Likewise, we need 


a21 G22 
dzz A dxz3(A) = det ie A 5 
Before defining elementary k-forms in general, let us look at elementary 
l-forms. In RÌ, there are three elementary 1-forms, which will be denoted 
by da,,dz2 and dz3. Each will measure the one-dimensional volume (the 
length) of the projection of a one-dimensional parallelepiped in R® to a 
coordinate axis. For example, with 


its projection to the x-axis is just (1). Then we want to define 


1 
dai(v) =da, | 2} =1. 
3 
In general, for a vector 
all 
a21 
a31 
we have 
Qi) @11 a11 
da, | a2, | =a11, dz2 | a21 | =az21, dzz | azı | = azı. 
a31 a31 a31 


Now to define elementary k-forms on R”. Label the coordinates of R” as 
£1,...,;%n. Choose an increasing subsequence of length k from (1, 2,...,n), 
which we will denote by 


I= (i1,..., ik) 
with 1 <i, <...< ig <n. Let 
Qil Q12 -.. Qik 
A= : 
Ani sen > Ank 


be an n x k matrix. Its columns will span a k-dimensional parallelepiped 

P in R”. For convenience of exposition, let A; be the ith row of A, i.e., 

Ay 

A=]| : 
Ay 
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We want the elementary k-form 
dgr = dg; A+ A dzi, 


to act on the matrix A to give us the k-dimensional volume of the par- 
allelepiped P projected onto the k-dimensional 2;,,...,2;, space. This 
motivates the definition: 


Ai, 
da;(A) = da, A---Ada;,(A)=det |: 
Ai, 
Elementary k-forms are precisely the devices that measure the volumes of 
k-dimensional parallelepipeds after projecting to coordinate k-spaces. The 


calculations come down to taking determinants of the original matrix with 
some of its rows deleted. 


6.2.2 The Vector Space of k-forms 


Recall back in Chapter One that we gave three different interpretations for 
the determinant of a matrix. The first was just how to compute it. The 
third was in terms of volumes of parallelepipeds, which is why determinants 
are showing up here. We now want to concentrate on the second interpre- 
tation, which in words was that the determinant is a multilinear map on 
the space of columns of a matrix. More precisely, if M,,(R) denotes the 
space of all n x k matrices with real entries, we had that the determinant 
of an n x n matrix A is defined as the unique real-valued function 


det : Mnn(R) > R 


satisfying: 
a) det (Aj, iy AAR, sales An) = Adet(Aj, tee) Ak). 
b) det(Ay,..., Ax + AAj,..-, An) = det(A1,...,An) for k £1. 
c) det(Identity matrix) = 1. 

A k-form will have a similar looking definition: 


Definition 6.2.1 A k-formw is a real-valued function 
w:Mr(R) > R 
satisfying: 
w(Ay,..,AB + pC,..., Ap) = Aw(A1,..., B,..., Ag) + pw(A1,...,C,..., Ag). 


Thus w is a multilinear real-valued function. 
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By the properties of determinants, we can see that each elementary k- 
form dzz is in fact a k-form. (Of course this would have to be the case, or 
we wouldn’t have called them elementary k-forms in the first place.) But 
in fact we have 


Theorem 6.2.1 The k-forms for a vector space R” form a vector space of 
dimension (R): The elementary k-forms are a basis for this vector space. 


This vector space is denoted by A\*(R”). 


We will not prove this theorem. It is not hard to prove that the k-forms 
are a vector space. It takes a bit more work to show that the elementary 
k-forms are a basis for \*(R”). 

Finally, note that 0-forms are just the real numbers themselves. 


6.2.3. Rules for Manipulating k-forms 


There is a whole machinery for manipulating k-forms. In particular, a k- 
form and an I-form can be combined to make a (k + 1)-form. The method 
for doing this is not particularly easy to intuitively understand, but once 
you get the hang of it, it is a straightforward computational tool. We will 
look carefully at the R? case, then describe the general rule for combining 
forms and finally see how this relates to the R” case. 

Let xı and 2x» be the coordinates for R?. Then dx, and dz» are the 
two elementary 1-forms and dx, Adz- is the only elementary 2-form. But 
it looks, at least notationally, that the two 1-forms dz, and dz_ somehow 
make up the 2-form dz, A dz2. We will see that this is indeed the case. 


Let 
v= e and v = 2) 
a21 a22 
be two vectors in R?. Then 


dg (vı) = 411 and dx; (v2) = Q12 


and 
dzz(vı) = Q21 and dzz (v2) = A22. 


The 2-form dg, A dzz acting on the 2 x 2 matrix (vı, v2) is the area of the 
parallelogram spanned by the vectors vı and vz and is hence the determi- 
nant of the matrix (vı, v2). Thus 


da, A dte(v1, V2) = @11422 — A124021. 
But note that this equals 
dz (vı) dz2(v2) — dzı (v2) dz2(v1). 
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At some level we have related our 2-form dx, A dz with our 1-forms dz 
and dz2, but it is not clear what is going on. In particular, at first glance 
it would seem to make more sense to change the above minus sign to a plus 
sign, but then, unfortunately, nothing would work out correctly. 

We need to recall a few facts about the permutation group on n el- 
ements, Sn. (There is more discussion about permutations in Chapter 
Eleven.) Each element of S, permutes the ordering of the set {1,2,...,n}. 
In general, every element of Sn can be expressed as the composition of flips 
(or transpositions). 

If we need an even number of flips to express an element, we say that 
the element has sign 0 while if we need an odd number of flips, then the sign 
is 1. (Note that in order for this to be well-defined, we need to show that 
if an element has sign 0 (1), then it can only be written as the composition 
of an even (odd) number of flips; this is indeed true, but we will not show 
it.) 

Consider Sy. There are only two ways we can permute the set {1,2}. 
We can either just leave {1,2} alone (the identity permutation), which has 
sign 0, or flip {1,2} to {2,1}, which has sign 1. We will denote the flip that 
sends {1,2} to {2,1} by (1,2). There are six ways of permuting the three 
elements {1,2,3} and thus six elements in S3. Each can be written as the 
composition of flips. For example, the permutation that sends {1,2,3} to 
{3, 1,2} (which means that the first element is sent to the second slot, the 
second to the third slot and the third to the first slot) is the composition of 
the flip (1,2) with the flip (1,3), since, starting with {1, 2,3} and applying 
the flip (1,2), we get {2,1,3}. Then applying the flip (1,3) (which just 
interchanges the first and third elements), we get {3, 1,2}. 

We will use the following notational convention. If o denotes the flip 
(1,2), then we say that 


o(1) = 2 and o(2) = 1. 


Similarly, if ø denotes the element (1,2) composed with (1,3) in S3, then 
we write 
o(1) = 2,0(2) = 3 and o(3) = 1, 

since under this permutation one is sent to two, two is sent to three and 
three is sent to one. 

Suppose we have a k-form and an /-form. Let n = k+l. We will consider 
a special subset of S,, the (k,1) shuffles, which are all elements ø € Sn that 
have the property that 


a(1) < o(2) <-++ < afk) 


and 
o({k+1) <a(k+2) <---<o(k+l). 
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Thus the element o that is the composition of (1,2) with (1,3) is a (2,1) 
shuffle, since 
a(l) =2 < 3 = g(2). 
Denote the set of all (k,l) shuffles by S(k,1). One of the exercises at the 
end of the chapter is to justify why these are called shuffles. 
We can finally formally define the wedge product. 


Definition 6.2.2 Let A = (Ay,...,Ax41) be an N x (k +1) matriz, for 
any N. (Here each A; denotes a column vector.) Lett be a k-form and w 
be an l-form. Then define 


tAwWA)= X (DFMO Aa Aan oA Aan: 


oES(k,l) 


Using this definition allows us to see that the wedge in R? of two elemen- 
tary 1-forms does indeed give us an elementary 2-form. A long calculation 
will show that in R*, the wedge of three elementary 1-forms yields the 
elementary 3-form. 
It can be shown by these definitions that two 1-forms will anti-commute, 
meaning that 
dz \ dy = —dy Adz. 


In general, we have that if 7 is a k-form and w is an l-form, then 
TAw = (-1)HuwAr. 


This can be proven by directly calculating from the above definition of 
wedge product (though this method of proof is not all that enlightening). 
Note that for k and | both being odd, this means that 


TAw=(—lwAr. 
Then for k being odd, we must have that 
TAT =(-l)T Az, 


which can only occur if 
TAT=0. 


In particular, this means that it is always the case that 
dz; A dz; = 0 


hana 
dz; A dz; = —dz; A dz;. 
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6.2.4 Differential k-forms and the Exterior Derivative 


Here the level of abstraction will remain high. We are after a general 
notion of what can be integrated (which will be the differential k-forms) 
and a general notion of what a derivative can be (which will be the exterior 
derivative). 

First to define differential k-forms. In R”, if we let J = {i;,...,i%} 
denote some subsequence of integers with 


I< <... <ik IN, 
then we let 
dgr = dzi A't A dzi. 


Then a differential k-form w is: 


w = o fr dzy, 
all possible I 


where each fr = fr(z1,...,%n) is a differentiable function. 
Thus 
(xy + sin(z2))dzı + £igədz£2 


is an example of a differential 1-form, while 
e473 dr, A dag + r3 dzz A dz3 


is a differential 2-form. 

Each differential k-form defines at each point of R” a different k-form. 
For example, the differential 1-form (x, + sin(z2))dz, + zız2dzz is the 
1-form 3 da; at the point (3,0) and is 5dz, + 2mdz at the point (4, 4). 

To define the exterior derivative, we first define the exterior derivative 
of a differential 0-form and then by induction define the exterior derivative 
for a general differential k-form. We will see that the exterior derivative is 
a map from k-forms to (k + 1)-forms: 


d:k-forms — (k + 1)-forms. 


A differential 0-form is just another name for a differentiable function. 





Given a 0-form f(x1,...,2,), its exterior derivative, denoted by df, is: 
n 
of 
df = i 
= 2035 


For example, if f(21,22) = 2,22 + 73, then 


df = xodzx, + (2; + 3x3)dae. 
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Note that the gradient of f is the similar looking (x2, 2, + 323). We will 
see in the next section that this is not chance. 
Given a k-form w = all possible J frdz,, the exterior derivative dw 
is: 
du= X.  dfrAder. 
all possible I 


For example, in R°, let 


w = fida1 + fodxe + fsda3 
be some 1-form. Then 


dw = dfida,+dfodxe + dfzdz3 


a CONN OIG og ON 
= TA dzı + Irz dz2 + ano A dzı 


+E an + GERIN + CERT A dzz 


Ox, Oxo r3 
+B as + SP an + se ans) A das 
= Se — SP as, A^ dz3 + 52 = P in A dae 
+ a — 52 as A dz3. 


Note that this looks similar to the curl of the vector field 


(fi, fo, fa) 


Again, we will see that this similarity is not just chance. 
Key to many calculations is: 


Proposition 6.2.1 For any differential k-form w, we have 
d(dw) = 0. 


The proof is one of the exercises at the end of the chapter, but you need to 
use that in R” the order of differentiation does not matter, i.e., 


ð ðf _ a Of 


Bas On; Or Bae 





and that dx; A dz; = —dz; A dzi. 
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6.3 Differential Forms and Vector Fields 


The overall goal for this chapter is to show that the classical Divergence 
Theorem, Green’s Theorem and Stokes’ Theorem are all special cases of one 
general theorem. This one general theorem will be stated in the language 
of differential forms. In order to see how it reduces to the theorems of last 
chapter, we need to relate differential forms with functions and vector fields. 
In R3, we will see that the exterior derivative, under suitable interpretation, 
will correspond to the gradient, the curl and the divergence. 

Let x,y and z denote the standard coordinates for RÌ. Our first step is 
to define maps 


functions on R? 
vector fields on R? 
vector fields on R? 


T; :3-forms — functions on R?. 


To : 0-forms 


T, : 1-forms 


pel 


Ts : 2-forms 


We will see that To, T) and T; have natural definitions. The definition for 
Tə will take a bit of justification. 

In the last section, we saw that differential 0-forms are just functions. 
Thus To is just the identity map. From last section, we know that there 
are three elementary 1-forms: dx, dy and dz. Thus a general differential 
1-form will be 


w= fi(z,y,z)dx + f(z, y, z)dy + f3(x, y, z)dz, 


where fı, fo and f3 are three separate functions on R°. Then define 


Ti(w) = (fi, fo, fs). 


The definition for T; is just as straightforward. We know that on R? there 
is only a single elementary 3-form, namely dx Ady Adz. Thus a general 
differential 3-form looks like: 


w= f(x,y,z)dx Ady Adz, 
where f is a function on R3. Then we let 
T3(w) = f(#,y, 2). 


As we mentioned, the definition for T> is not as straightforward. There 
are three elementary 2-forms: dg A dy, dz A dz and dy Adz. A general 
differential 2-form looks like: 


w= filz, y, z)dz A dy + fo(x,y, z)dx Adz + f3(z, y, z)dy A dz, 
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where, as expected, f1, fo and f3 are functions on R°. Define the map T> 
by: 
T2(w) = (fs, — fa, fi). 


One method for justifying this definition will be that it will allow us to 
prove the theorems needed to link the exterior derivative with the gradient, 
the curl and the divergence. A second method will be in terms of dual 
spaces, as we will see in a moment. 

We want to show: 


Theorem 6.3.1 On R?, let wy denote a k-form. Then 
Tı (dwo) = grad(To(wo)), 
Ta(dw) = curl (Ti (w1)), 


and 
T (dw) = div(T. (we)). 


Each is a calculation (and is an exercise at the end of this chapter). We 
needed to define JT) as we did in order to make the above work; this is one 
of the ways that we can justify our definition for the map Tz. 

There is another justification for why T2 must be what it is. This 
approach is a bit more abstract, but ultimately more important, as it gen- 
eralizes to higher dimensions. Consider R” with coordinates 21,...,2n.- 
There is only a single elementary n-form, namely dz, A... Adz. Thus 
the vector space \"(R”) of n-forms on R” is one-dimensional and can be 
identified to the real numbers R. Label this map by 


T:A"(R") OR. 


Thus T(adzı A... A d£n) = a. 

We now want to see that the dual vector space to A" (R”) can be natu- 
rally identified with the vector space \"~*(R”). Let wa—z be in \” *(R”). 
We first show how an (n — k)-form can be interpreted as a linear map on 
N(R”). If wp is any k-form, define 


Wn-klwk) =T (wn-k Awk). 


It is a direct calculation that this is a linear map. From Chapter One we 
know that the dual vector space has the same dimension as the original 
vector space. By direct calculation, we also know that the dimensions for 
N(R”) and \"*(R”) are the same. Thus A”T*(R”) is the dual space 
to A(R”). 

Now consider the vector space A (R3), with its natural basis of dz, dy 
and dz. Its dual is then A?(R°). As a dual vector space, an element of 
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the natural basis is that which sends one of the basis vectors of A! (R3) to 
one and the other basis vectors to zero. Thus the natural basis for A? (R3), 
thought of as a dual vector space, is dy A dz (which corresponds to the 1- 
form dz, since dy Adz Ada = 1-dz Ady Adz), -dx Adz (which corresponds 
to dy) and dx A dy (which corresponds to dz). Then identifying dx with 
the row vector (1,0,0), dy with (0,1,0) and dz with (0,0,1), we see that 
dy Adz should be identified with (1,0,0), dzAdz with (0,—1,0) and dxAdy 
with (0,0,1). Then the 2-form 


w= fidx Ady + fodz Adz + fsdy A dz 


should indeed be identified with (f3,—f2, f,), which is precisely how the 
map Ts is defined. 


6.4 Manifolds 


While manifolds are to some extent some of the most natural occurring 
geometric objects, it takes work and care to create correct definitions. In 
essence, though, a k-dimensional manifold is any topological space that, in 
a neighborhood of any point, looks like a ball in R*. We will be at first 
concerned with manifolds that live in some ambient R”. For this type of 
manifold, we give two equivalent definitions: the parametric version and 
the implicit version. For each of these versions, we will carefully show that 
the unit circle S! in R? 


is a one-dimensional manifold. (Of course if we were just interested in circles 
we would not need all of these definitions; we are just using the circle to 
get a feel for the correctness of the definitions.) Then we will define an 
abstract manifold, a type of geometric object which need not be defined in 
terms of some ambient R”. 

Consider again the circle S1. Near any point p € S! the circle looks 
like an interval (admittedly a bent interval). In a similar fashion, we want 
our definitions to yield that the unit sphere S? in R? is a two-dimensional 
manifold, since near any point p € S?, 
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the sphere looks like a disc (though, again, more like a bent disc). We want 
to exclude from our definition of a manifold objects which contain points 
for which there is no well-defined notion of a tangent space, such as 


which has tangent difficulties at p, and the cone 


Y 


p 


which has tangent difficulties at the vertex p. As a technical note, we 
will throughout this section let M denote a second countable Hausdorff 
topological space. 

For k < n, a k-dimensional parametrizing map is any differentiable map 


@: (Ball in R*) + R” 


such that the rank of the Jacobian at every point is exactly k. In local 
coordinates, if u1,..., ux are the coordinates for R* and if ¢ is described by 
the n differentiable functions ¢1,...,@n (i.e., 6 = (¢1,.--,¢n)), we require 
that at all points there is a k x k minor of the n x k Jacobian matrix 


ði .,. ĉl 
ðu Our 
Do=|{ : 
Bn .,, Bhn 
ðu Our 


that is invertible. 
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Definition 6.4.1 (Parametrized Manifolds) The Hausdorff topological 
space M in R?” is a k-dimensional manifold if for every point p E€ M in R”, 
there is an open set U in R” containing the point p and a parametrizing 
map $ such that 

(Ball in R) = MAV. 


Consider the circle S!. At the point p = (1,0), a parametrizing map is: 


(u) T (v 1—u?,u), 


while for the point (0,1), a parametrizing map could be: 


plu) = (u, V1— u’). 


Given the parametrization, we will see in section five that it is easy 
to find a basis for the tangent space of the manifold. More precisely the 
tangent space is spanned by the columns of the Jacobian D@. This is indeed 
one of the computational strengths of using parametrizations for defining 
a manifold. 

Another approach is to define a manifold as the zero locus of a set of 
functions on R”. Here the normal vectors are practically given to us in the 
definition. 


Definition 6.4.2 (Implicit Manifolds) A set M in R” is a k-dimensional 
manifold if, for any point p € M there is an open set U containing p and 
(n — k) differentiable functions pı,..., Pn- such that 


1. MAU =(p =0) NN (Pn-k = 0). 
2. At all points in M NU, the gradient vectors 
Vo, e.g VPn-k 
are linearly independent. 


It can be shown that the normal vectors are just the various Vp;. 
For an example, turn again to the circle S'. The implicit method just 
notes that 
S! = {(2,y): 2? +y? —1=0}. 


Here we have p = g? +y? — 1. Since 
V(a? + y? — 1) = (22, 2y) 


is never the zero vector, we are done. 
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The two definitions are equivalent, as discussed in the section on the 
implicit function theorem. But both of these definitions depend on our 
set M being in R”. Both critically use the properties of this ambient R”. 
There are situations where we still want to do calculus on a set of points 
which do not seem to live, in any natural way, in some R”. Historically 
this was first highlighted in, Einstein’s General Theory of Relativity, in 
which the universe itself was described as a 4-dimensional manifold that is 
neither R4 nor living in any natural way in a higher dimensional R”. By all 
accounts, Einstein was amazed that mathematicians had built up the whole 
needed machinery. Our goal here is to give the definition of an abstract 
manifold and then to show, once again, that S! is a manifold. Throughout 
this we will be using that we already know what it means for a function 
f: R” > R” to be differentiable. 


Definition 6.4.3 (Manifolds) A second countable Hausdorff topological 
space M is an n-dimensional manifold if there is an open cover (Ua) such 
that for each open set, Ux, we have a continuous map 


Qa : Open ball in R?” + Uy 
that is one-to-one and onto and such that the map 
$a be : Pg (Ua N Ug) > 9a" (Ua N Up) 
is differentiable. 


Ball in IR" Ball in R’ 
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Note that bz (Ua N Ug) and ġ3! (Ua N Ug) are both open sets in R” and 
thus we do know what it means for ¢z!¢g to be differentiable, as discussed 
in Chapter Three. The idea is that we want to identify each open set 
Ux in M with its corresponding open ball in R”. In fact, if £1,..., 2n 
are coordinates for R”, we can label every point p in Ug as the n-tuple 
given by $,1(p). Usually people just say that we have chosen a coordinate 
system for U, and identify it with the coordinates z1,...,%, for R™. It 
is this definition that motivates mathematicians to say that a manifold is 
anything that locally, around each point, looks like an open ball in R”. 

Let us now show that S! satisfies this definition of a manifold. We will 
find an open cover of S! consisting of four open sets, Tor each of these write 
down the corresponding map ¢; and Blas see that ob," og is differentiable. 
(It is similar to show that the other ¢; loj are differentiable.) 


PF 
I p 


Set 
U = {(x,y) € St: x > 0} 
and let 
Qı 7 (-1,1) => Ui 
be defined by 
ġı(u) = (V1- u, u) 
Here (—1,1) denotes the open interval {x : —1 < x < 1}. In a similar 


fashion, set 


U2 = {(a,y)€S':y>0} 
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Us = {(2,y)€S*:2 <0} 
Us = {(2,y) €S':y <0} 


and 


do(u) = (u,V1—u?) 


zlu) = (-V1—vu?,u) 
da(u) = (u,—V1—v?). 


Now to show on the appropriate domain that pith is differentiable. We 


have 
pi plu) = i (u, V1- u?) = y1- u? 


which is indeed differentiable for —1 < u < 1. (The other verifications are 
just as straightforward.) 

We can now talk about what it means for a function to be differentiable 
on a manifold. Again, we will reduce the definition to a statement about 
the differentiability of a function from R” to R. 


Definition 6.4.4 A real-valued function f on a manifold M is differen- 
tiable if for an open cover (Ua) and maps $a : Open ball in R? -+ Ux, the 
composition function 


f od¢dq : Open ball in R" + R 
is differentiable. 


There is still one difficulty with our abstract definition of a manifold. 
The definition depends upon the existence of an open cover of M. Think 
of our open cover of the circle S1. Certainly there are many other open 
covers that will also place a manifold structure on S}, such as: 


but still, it’s the same circle. How can we identify these different ways 
of putting a manifold structure on the circle? We are led to the desire 
to find a natural notion of equivalence between manifolds (as we will see, 
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we will denote this type of equivalence by saying that two manifolds are 
diffeomorphic). Before giving a definition, we need to define what it means 
to have a differentiable map between two manifolds. For notation, let M 
be an m-dimensional manifold with open cover (U,) and corresponding 
maps ¢q and let N be an n-dimensional manifold with open cover (Vg) and 
corresponding maps ng. 


Definition 6.4.5 Let f: M —> N be a map from M to N. Letpe M with 
Ua an open set containing p. Set q = f(p) and suppose that Vg is an open 
set containing q. Then f is differentiable at p if the map ng o f ody is 
differentiable in a neighborhood of the point ¢,'(p) in R™. The map f is 
differentiable if it is differentiable at all points. 


We can now define our notion of equivalence. 


Definition 6.4.6 Two manifolds M and N are diffeomorphic if there ex- 
ists a map f : M > N that is one-to-one, onto, differentiable and such that 
the inverse map, f~', is differentiable. 


Finally, by replacing the requirement that the various functions involved 
are differentiable by continuous functions, analytic functions, etc., we can 
define continuous manifolds, analytic manifolds, etc. 


6.5 Tangent Spaces and Orientations 


Before showing how to integrate differential k-forms along a k-dimensional 
manifold, we have to tackle the entirely messy issue of orientability. But 
before we can define orientability, we must define the tangent space to a 
manifold. If we use the implicit or parametric definition for a manifold, this 
will be straightforward. The definition for an abstract manifold is quite a 
bit. more complicated (but as with most good abstractions, it is ultimately 
the right way to think about tangent vectors). 


6.5.1 Tangent Spaces for Implicit and Parametric 
Manifolds 


Let M be an implicitly defined manifold in R” of dimension k. Then by 
definition; for each point p € M there is an open set U containing p and 
(n — k) real-valued functions p1,...,n—» defined on U such that 


(P1 =O)... (pn-z = 9) = MNU 
and, at every point q E€ M NỌ, the vectors 
Vpı (4), sai , VPn—k (Q) 
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are linearly independent. We have 


Definition 6.5.1 The normal space N,(M) to M at the point p is the 
vector space spanned by the vectors 


Vp1(p),--+; VPn—k(P). 


The tangent space T,(M) to the manifold M at the point p consists of all 
vectors v in R” that are perpendicular to each of the normal vectors. 


If £z1,..., £n are the standard coordinates for R”, we have 


Lemma 6.5.1 A vector v = (v),...,Un) is in the tangent space T,(M) if 
for alli=1,...,n—k we have 


n 


Opi 
d= ¥- Val) = Y Py, 
l 


j=l 


The definition for the tangent space for parametrically defined manifolds 
is as straightforward. Here the Jacobian of the parametrizing map will be 
key. Let M be a manifold in R”, with the parametrizing map 


¢ : (Ball in R) > R” 


given by the n functions 


g= (913-3 On). 
The Jacobian for ¢ is the n x k matrix 
oor dd. 
Out see Our 
Do= : ; 
Don Bón 
ĝui ite Our 


Definition 6.5.2 The tangent space Tp(M) for M at the point p is spanned 
by the columns of the matriz Do. 


The equivalence of these two approaches can, of course, be shown. 


6.5.2 Tangent Spaces for Abstract Manifolds 


Both implicitly and parametrically defined manifolds live in an ambient R”, 
which carries with it a natural vector space structure. In particular, there 
is a natural notion for vectors in R” to be perpendicular. We used this 
ambient space to define tangent spaces. Unfortunately, no such ambient 
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R” exists for an abstract manifold. What we do know is what it means for 
a real-valued function to be differentiable. 

In calculus, we learn about differentiation as a tool to both find tangent 
lines and also to compute rates of change of functions. Here we concentrate 
on the derivative as a rate of change. Consider three-space, R*, with the 
three partial derivatives 2, Š and £. Each corresponds to a tangent 
direction for R? but each also gives a method for measuring how fast a 
function f(x,y,z) is changing, i.e., 


of 


a= how fast f is changing in the x-direction, 
T 


-A = how fast f is changing in the y-direction 
y 


and 
of = how fast f is changing in the z-direction. 


This is how we are going to define tangent vectors on an abstract mani- 
fold, as rates of change for functions. We will abstract out the algebraic . 
properties of derivatives (namely that they are linear and satisfy Leibniz’s 
rule). 

But we have to look at differentiable functions on M a bit more closely. 
If we want to take the derivative of a function f at a point p, we want this to 
measure the rate of change of f at p. This should only involve the values of 
f near p. What values f achieves away from p should be irrelevant. This is 
the motivation behind the following equivalence relation. Let (f,U) denote 
an open set on M containing p and a differentiable function f defined on 
U. We will say that i 
(f,U) ~ (9,V) 


if, on the open set UNV, we have f = g. This leads us to defining 
Cy ={(F,U)}/~- 


We will frequently abuse notation and denote an element of C5° by f. The 
space Cf° is a vector space and captures the properties of functions close to 
the point p. (For mathematical culture sake, C>° is an example of a germ 
of a sheaf, in this case the sheaf of differentiable functions.) 


Definition 6.5.3 The tangent space T,(M) is the space of all linear maps 
v: > Cr 
such that 
v(fg) = folg) + golf). 


To finish the story, we would need to show that this definition agrees 
with the other two, but this we leave as nontrivial exercises. 
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6.5.3 Orientation of a Vector Space 


Our goal is to see that there are two possible orientations for any given 
vector space V. Our method is to set up an equivalence relation on the 
possible bases for V and see that there are only two equivalence classes, 
each of which we will call an orientation. 


Let v1,...,Vn and wj,...,W, be two bases for V. Then there exists 
unique real numbers ajj;, with 7,7 =1,...,n such that 
Wy = 411V1 +++°+4inVn 
Wn = GniVi +++ +@nnVn- 


Label the n x n matrix (a,j) by A. Then we know that det(A) 4 0. We 
say that the bases vj,...,Vn and w1,...,Wn have the same orientation if 
det(A) > 0. If det(A) < 0, then we say that they two bases have opposite 
orientation. It can be shown via matrix multiplication that 


Lemma 6.5.2 Having the same orientation is an equivalence relation on 
the set of bases for a vector space. 


The intuition is that two bases v,,...,Vn and W1,...,Wn should have 
the same orientation if we can continuously move the basis v1,...,Vn to 
W1,--.,Wn so that at each step we still have a basis. In pictures for R?, 
the bases {(1,0), (0,1)} and {(1,1), (—1,1)} have the same orientation but 
different from the basis {(—1,0), (0, 1)}. 


v2=(-1,1) 
va=(0,1) vi=(1,1) 
Same orientation as: 
vi=(1,0) 
not the same 
orientation as: v2=(0,1) 
v4=(-1,0) 


Choosing an orientation for a vector space means choosing one of the 
two possible orientations, i.e., choosing some basis. 
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6.5.4 Orientation of a Manifold and its Boundary 


A manifold M has an orientation if we can choose a smoothly varying 
orientation for each tangent space T,(M). We ignore the technicalities of 
what ‘smoothly varying’ means, but the idea is that we can move our basis 
in a smooth manner from point to point on the manifold M. 

Now let X° be an open connected set in our oriented manifold M such 
that if X denotes the closure of X°, then the boundary O0(X) = X — X° is 
a smooth manifold of dimension one less than M. For example, if M = R2, 
an example of an X° could be the open unit disc 


D = {(z,y): 2? +4? <1}. 
Then the boundary of D is the unit circle 


S = 1 (ey) ia? by = 1}, 





which is a one-dimensional manifold. The open set X° inherits an orienta- 
tion from the ambient manifold M. Our goal is to show that the boundary 
0(X) has a canonical orientation. Let p € O(X). Since (X) has dimen- 
sion one less than M, the normal space at p has dimension one. Choose a 
normal direction n that points out of X, not into X. The vector n, while 
normal to O(X), is a tangent vector to M. Choose a basis 1,...,Un—1 
for T,(O(X)) so that the basis n,v1,...,Un—1 agrees with the orientation 
of M. It can be shown that all such chosen bases for T,(O(X)) have the 
same orientation; thus the choice of the vectors v1,...,Un—1 determines an 
orientation on the boundary manifold (X). 


For example, let M = R?. At each point of R?, choose the basis 


{(1,0), (0, 1)}- 
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For the unit circle $1, an outward pointing normal is always, at each point 
p= (x,y), just the vector (x,y). Then the tangent vector (—y, x) will give 
us a basis for R? that has the same orientation as the given one. Thus we 
have a natural choice of orientation for the boundary manifold. 


6.6 Integration on Manifolds 


The goal of this section is to make sense out of the symbol 


M 


where M will be a k-dimensional manifold and w will be a differential k- 
form. Thus we want to (finally) show that differential k-forms are the things 
that will integrate along k-dimensional manifolds. The method will be to 
reduce all calculations to doing multiple integrals on R*, which we know 
how to do. 

We will first look carefully at the case of 1-forms on R?. Our manifolds 
will be 1-dimensional and hence curves. Let C be a curve in the plane R? 
that is parametrized by the map: 


a: [a,b] + R?, 


with 
a(u) = (x(u), y(u)). 


If f(x,y) is a continuous function defined on R?, then define the path 
integral, f., f(z,y)dz, by the formula 


E dz 
[ f(e,y)dex = / Falu) yu) Edu. 
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Note that the second integral is just a one-variable integral over an interval 
on the real line. Likewise, the symbol f c f(z, y)dy is interpreted as 


b 
[teniu f Hena 


A the chain rule, it can be checked that the numbers Je f(x, y)dz and 
faf c f(z,y)dy are independent of the chosen parametrizations. Both of these 
are a suggestive, as at least formally f(z,y)dz and f(x,y)dy look 
like differential 1-forms on the plane R?. Consider the Jacobian of the 
parametrizing map o(u), which is the 2 x 1 matrix 


daz /du 
= =( dy/du ). 
Letting f(z,y)da and f(z, y)dy be differential 1-forms, we have by defini- 


tion that at each point of o(u), 


Fla a)de(Do) = fleyw)ae( ( $7/5" )) = s(e(u) E 


and 


f(z, y)dy(Do) = renad gia D = Feya) E. 
y 


Thus we could write the integrals f c f(z, y)dz and fo f(z,y)d 


b 
[ sesnao= f se,u)do(Do)au 

C a 

and , 
f Oe J H(2,y)dy(Do)du. 

O a 


This suggests how to define in general {,,w. We will use that w, as a k- 
form, will send any n x k matrix to a real number. We will parametrize 
our manifold M and take w of the Jacobian of the parametrizing map. 


Definition 6.6.1 Let M be a k-dimensional oriented differentiable mani- 
fold in R” such that there is a parametrizing one-to-one onto map 


ġ:B> M 
where B denotes the unit ball in R*. Suppose further that the parametrizing 


map agrees with the orientation of the manifold M. Let w be a differential 
k-form on R”. Then 


fe = |> Dou -du 
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Via a chain rule calculation, we can show that fyw is well-defined: 


Lemma 6.6.1 Given two orientation preserving parametrizations ¢, and 
gi of a k-dimensional manifold M, we have 


/ w(Dd,)du, -++ dug =) w(Ddz)duz +++ dug. 
B B 


Thus f mw is independent of parametrization. 


We now know what f,,w means for a manifold that is the image of a 
differentiable one-to-one onto map from a ball in R*. Not all manifolds can 
be written as the image of a single parametrizing map. For example, the 
unit sphere $? in R? needs at least two such maps (basically to cover both 
the north and south poles). But we can (almost) cover reasonable oriented 
manifolds by a countable collection of non-overlapping parametrizations. 
More precisely, we can find a collection {Ua } of nonoverlapping open sets in 
M such that for each a there exists a parametrizing orientation preserving 
map 

pa : B > Ux 


and such that the space M — (JU, has dimension strictly smaller than k. 
Then for any differential k-form we set 


hgh 


Of course, this definition seems to depend on our choice of open sets, but 
we can show (though we choose not to) that: 


Lemma 6.6.2 The value of {,,w is independent of choice of set {Ua}. 


While in practice the above summation could be infinite, in which case 
questions of convergence must arise, in practice this is rarely a problem. 


6.7 Stokes’ Theorem 


We now come to the goal of this chapter: 


Theorem 6.7.1 (Stokes’ Theorem) Let M be an oriented k-dimensional 
manifold in R” with boundary OM, a smooth (k-1)-dimensional manifold 
with orientation induced from the orientation of M. Letw be a differential 


(k-1)-form. Then 
s dw = I> w. 
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This is a sharp quantitative version of the intuition: 
Average of a function on boundary = Average of derivative on interior. 


This single theorem includes as special cases the classical results of the Di- 
vergence Theorem, Green’s Theorem and the vector-calculus Stokes’ The- 
orem. 

We will explicitly prove Stokes’ Theorem only in the special case that 
M is a unit cube in R* and when 


w= f(a,...,2n)dteA...Adazp. 


After proving this special case, we will sketch the main ideas behind the 
proof for the general case. 
Proof in unit cube case: Here 


M = {(a,...,@%): for each 1,0 < z; < 1}. 


The boundary M of this cube consists of 2k unit cubes in R*—}. We will 
be concerned with the two boundary components 


Sı = {(0,22,...,2%) € M} 
and 
S2 = {(1, £2,..-, £k) € M}. 


For w = f(21,...,@%)dag A... Adz, we have 
ð 
dw = J ÊL dsi Adea A... Adar, 


= OF ae Ndara A... Ndk, 
xı 


since it is always the case that dæ; A dz; = 0. 
Now to integrate dw along the unit cube M. We choose our orientation 
preserving parametrizing map to be the identity map. Then 


1 1 
dw = f WE a clas 
M 0 0 Oxy 


By the Fundamental Theorem of Calculus we can do the first integral, to 


get 
1 1 
f a = ff f(1,a2,...,@%)dag-+-day, 
M 0 0 


1 1 
-f -f f(O, £2,- .,£k)dT2 -dE 
0 0 
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Now to look at the integral f,,,w. Since w = f(a1,...,24)da2 A... Ada, 
the only parts of the integral along the boundary that will not be zero will be 
along $, and S2, both of which are unit cubes in R*~!, with coordinates 
given by x9,...,2%- They will have opposite orientations though. (This 
can be seen in the example for when M is a square in the plane; then Sı 
is the bottom of the square and S% is the top of the square. Note how the 
orientations on Sı and Sz induced from the the orientation of the square 
are indeed opposite.) 


S; 


Then 


f w f orf w 
ƏM C1 C2 


1 1 
i f — f (0, £2, .. -, £k )dT2 +° dTk 
0 0 
1 1 
+f f f(l,£2,..., Ek )AT2*** dEk, 
0 0 


which we have just shown to equal to fy dw, as desired. O 


Now to sketch a false general proof for a manifold M in R”. We will use 
that the above argument for a unit cube can be used in a similar fashion 
for any cube. Also, any general differential (k — 1)-form will look like: 


w = X fidar, 


where each J is a (k-1)-tuple from (1,...,7). 


Divide M into many small cubes. Adjacent cubes’ boundaries will have 
opposite orientation. 
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Then 


f dw % Sum over the cubes J dw 
M little cube 


Sum over the cubes f 


w 
a(little cube) 
fw. 
a(M) 


The last approximation is from the fact that since the adjacent boundaries 
of the cubes have opposite orientations, they will cancel out. The only 
boundary parts that remain are those pushed out against the boundary of 
M itself. The final step would be to show that as we take more and more 
little cubes, we can replace the above approximations by equalities. 

It must be noted that M cannot be split up into this union of cubes. 
Working around this difficultly is non-trivial. 


Q 


6.8 Books 


An excellent recent book is Hubbard and Hubbard’s Vector Calculus, Linear 
Algebra, and Differential Forms: A Unified Approach [64], which contains a 
wealth of information, putting differential forms in the context of classical 
vector calculus and linear algebra. Spivak’s Calculus on Manifolds [103] is 
for many people the best source. It is short and concise (in many ways 
the opposite of Spivak’s leisurely presentation of € and 6 real analysis in 
[102]). Spivak emphasizes that the mathematical work should be done 
in getting the right definitions so that the theorems (Stokes’ Theorem in 
particular) follow easily. Its briefness, though, makes it possibly not the 
best introduction. Fleming’s Functions of Several Variables [37] is also a 
good introduction. 
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6.9 Exercises 


1. Justify why it is reasonable for shuffles to indeed be called shuffles. 
(Think in terms of shuffling a deck of cards.) 

2. In R?, let dz, dy and dz denote the three elementary 1-forms. Using 
the definition of the wedge product, show that 


(dz A dy) Adz = da A (dy A dz) 


and that these are equal to the elementary 3-form da A dy A dz. 
3. Prove that for any differential k-form w, we have 


d(dw) = 0. 
4. In R”, let dz and dy be one-forms. Show that 
dz A dy = —dy A dz. 


5. Prove Theorem 6.3.1. 
6. Show that the map 


Wn—k (We) = T(Wn—n A wr), 


with T : A” R” — R as defined in the chapter, provides a linear map from 
ATË R” to the dual space \* R®*. 
7. Prove that the unit sphere $? in R? is a two-dimensional manifold, using 


each of the three definitions. 


8. Consider the rectangle 
with opposite sides identified. Show first why this is a torus 


and then why it is a two-manifold. 
9. The goal of this problem is to show that real projective space is a 
manifold. On R”+! — 0, define the equivalence relation 


(Z0, £1,- -, En) Sa (Avo, At1,-.-,ATn) 
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for any nonzero real number àA. Define real projective n-space by 
P” = R+) — (0)/ ~. 


Thus, in projective three-space, we identify (1,2,3) with (2,4,6) and with 
(—10, —20, —30) but not with (2,3,1) or (1,2,5). In P”, we denote the 
equivalence class containing (xo,...,2n) by the notation (£o : ... : Zn) 
Thus the point in P? corresponding to (1,2,3) is denoted by (1 : 2 : 3). 
Then in P?, we have (1:2:3) = (2:4:6) # (1:2:5). Define 


Po : R” > P” 
by 
holur,- Un) = (1: U1 :... i Un), 
define 
Py :R” > P” 
by 
Qilur... Un) = (ur : L: u2:... i Un) 


etc., all the way up to a defining a map ¢,. Show that these maps can be 
used to make P” into an n-dimensional manifold. 
10. Show that the Stokes’ Theorem of this chapter has as special cases: 

a. the Fundamental Theorem of Calculus. (Note that we need to use 
the Fundamental Theorem of Calculus to prove Stokes’ Theorem; thus we 
cannot actually claim that the Fundamental Theorem of Calculus is a mere 
corollary to Stokes’ Theorem.) l 

b. Green’s Theorem. 

c. the Divergence Theorem. 

d. the Stokes’ Theorem of Chapter Five. 


Chapter 7 


Curvature for Curves and 
Surfaces 


Basic Objects: Curves and surfaces in space 
Basic Goal: Calculating curvatures 


Most of high school mathematics is concerned with straight lines and planes. 
There is of course far more to geometry than these flat objects. Classically 
differential geometry is concerned with how curves and surfaces bend and 
twist in space. The word “curvature” is used to denote the various measures 
of twisting that have been discovered. 

Unfortunately, the calculations and formulas to compute the different 
types of curvature are quite involved and messy, but whatever curvature is, 
it should be the case that the curvature of a straight line and of a plane 
must be zero, that the curvature of a circle (and of a sphere) of radius r 
should be the same at every point and that the curvature of a small radius 
circle (or sphere) should be greater than the curvature of a larger radius 
circle (or sphere) (which captures the idea that it is easier to balance on 
the surface of the earth than on a bowling ball). 

The first introduction to curvature-type ideas is usually in calculus. 
While the first derivative gives us tangent line (and thus linear) informa- 
tion, it is the second derivative that measures concavity, a curvature-type 
measurement. Thus we should expect to see second derivatives in curvature 
calculations. 


7.1 Plane Curves 


We will describe a plane curve via a parametrization: 
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r(t) = (x(t), y(t) 
and thus as a map 
r: ROR’. 


r(t) = (x(t), y(t) 


———_—_——___ 
t-axis 





The variable ¢ is called the parameter (and is frequently thought of as 
time). An actual plane curve can be parametrized in many different ways. 
For example, 


71(t) = (cos(t), sin(¢)) 
and 
ro(t) = (cos(2t), sin(2¢)) 
both describe a unit circle. Any calculation of curvature should be inde- 
pendent of the choice of parametrization. There are a couple of reasonable 
ways to do this, all of which can be shown to be equivalent. We will take 
the approach of always fixing a canonical parametrization (the arc length 


parametrization). This is the parametrization r : [a,b] + R such that the 
arc length of the curve is just b — a. Since the arc length is 


d 2 
st) ds, 


2 
we need 4/ (az)? + (2) = 1. Thus for the arc length parametrization, 


the length of the tangent vector must always be one: 


2 2 
= eat = de + dy =i. 
ds’ ds ds ds 


Back to the question of curvature. Consider a straight line 


dr 


rol =| 
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Note that each point of this line has the same tangent line. 
Now consider a circle: 


Here the tangent vectors’ directions are constantly changing. This leads 
to the idea of trying to define curvature as a measure of the change in the 
direction of the tangent vectors. To measure a rate of change we need to 
use a derivative. This leads to: 


Definition 7.1.1 For a plane curve parametrized by arc length 
r(s) = ((s),y(s)), 


define the principal curvature «x at a point on the curve to be the length of 
the derivative of the tangent vector with respect to the parameter s, i.e., 


_ {dT (s) 
e |20] 





Consider the straight line r(s) = (as + b,cs + d), where a,b,c and d are 
constants. The tangent vector is: 


T(s) = = = (a,c). 


Then the curvature will be 
dT 





ds = |(0,0)| = 0, 
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as desired. 
Now consider a circle of radius a centered at the origin; an arc length 


parametrization is 
8 _ 8 
r(s) = (a cos (=) „asin (=)) ; 
a a 


giving us that the curvature is 





ds 


( 1 s l. /s 
—— cos (=) ,—-— sin (=) 
a a a a 

1 1 

— cos? (=) + sin? (=) 

a? a a? a 


1 
z 


_ en 





Thus this definition of curvature does indeed agree with the intuitions about 
lines and circles that we initially desired. 


7.2 Space Curves 


Here the situation is more difficult; there is no single number that will cap- 
ture curvature. Since we are interested in space curves, our parametriza- 
tions will have the form: 


r(s) = (2(s), y(s); 2(s)). 


As in last section, we normalize by assuming that we have parametrized by 
arc length, i.e., 


dr dz dy dz 
ds|  |\ ds’ ds’ ds 


A a a a 


= 


[T(s)| 








dx dy =) 





Again we start with calculating the rate of change in the direction of the 
tangent vector. 
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Definition 7.2.1 For a space curve parametrized by arc length 
r(s) = (2(s), y(s), 2(s)), 


define the principal curvature x at a point to be the length of the derivative 
of the tangent vector with respect to the parameter s, i.e., 





_ |aT(s) 
Seo 





The number « is one of the numbers that captures curvature. Another is 
the torsion, but before giving its definition we need to do some preliminary 
work. 


Set 
_1aT 


k ds ` 

The vector N is called the principal normal vector. Note that it has length 
one. More importantly, as the following proposition shows, this vector is 
perpendicular to the tangent vector T(s). 


Proposition 7.2.1 
N-T=0 


at all points on the space curve. 


Proof: Since we are using the arc length parametrization, the length of 
the tangent vector is always one, which means 


T-T=1. 


Thus d 4 
zT wh) = ge = 


By the product rule we have 


d dT dT dT 
gT DT gta T7 g 
Then ar 
T: 70 


Thus the vectors T and aT are perpendicular. Since the principal normal 
vector N is a scalar multiple of the vector a, we have our result. O 
Set 
B=TxN, 
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a vector that is called the binormal vector. Since both T and N have length 
one, B must also be a unit vector. Thus at each point of the curve we have 
three mutually perpendicular unit vectors T, N and B. The torsion will be 
a number associated to the rate of change in the direction of the binormal 
B, but we need a proposition before the definition can be given. 


Proposition 7.2.2 The vector sB is a scalar multiple of the principal nor- 
mal vector N. 


ee We will show that 4B is perpendicular to both T and B, meaning 
that $8 B must point in the sane direction as N. First, since B has length 
one, by the same argument as in the previous proposition, just replacing 
all of the T's by Bs, we get that $2 -B=0. 


Now 
dB d 
= (SE xN) +T x D) 
= (kN xN)+ Hax) 
aN, 
= T = 
(A ds 


Thus {2 B must be perpendicular to the vector T. O 


Definition 7.2.2 The torsion of a space curve is the number T such that 


dB 

ds 
We need now to have an intuitive understanding of what these two numbers 
mean. Basically, the torsion measures how much the space curve deviates 
from being a plane curve, while the principal curvature measures the cur- 
vature of the plane curve that the space curve wants to be. Consider the 
space curve 


= —7TN. 


s 8 
= (8e0s(3) ,Ssin (5) 8), 
r(s) = (3 cos 3) 3sin (3 ) 
which is a circle of radius three living in the plane z = 5. We will see that 
the torsion is zero. First, the tangent vector is 


T(s) = = = (—sin (5) , COS () ,0). 


ad = (-Ż cos (=) sin (=) 50), 


Then 
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which gives us that the principal curvature is E, The principal normal 


vector is co 
8 _ 8 
= ay 7 005(5) sin (Z) 0). 
Then the binormal is 
B=T xN = (0,0,1), 


and thus 
dB 
ds 
The torsion is indeed zero, reflecting the fact that we are actually dealing 
with a plane curve disguised as a space curve. 
Now consider the helix 


r(t) = (cos(t), sin(t), t). 


= (0,0,0) = 0: N. 





(cos(t),sin(t),t) 


It should be the case that the principal curvature should be a positive 
constant, as the curve wants to be a circle. Similarly, the helix is constantly 
moving out of a plane, due to the t term in the z-coordinate. Hence the 
torsion should also be a nonzero constant. The tangent vector 


T = (—sin(#),cos(t), 1) 


does not have unique length. The arc length parametrization for this helix 
is simply 
1 


(t) (cos (et) si (z) t) 
r(t) = (cos | —at } , sin | —=t ] , —=?). 

V2 V2)? Jf2 
Then the unit tangent vector is 


n= Jy (St) a(t) 
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The principal curvature « is the length of the vector 
dT 1 1 1, 1 
dr = 5 cos (=) 77 sn (=) ,0). 


_1 
k= 


Thus 


Then the principal normal vector is 


N(t) = 2 = (— cos (=) ,— sin (=) , 0). 
The binormal vector is 
B=TxN 
= as sin (z) aioe cos (5) —) 
V2 V2)? V2 V2 J? V 


The torsion 7 is the length of the vector 


GB ss ay li j 0) 
a 2 S A PAZ 


and hence we have 


7.3 Surfaces 


Measuring how tangent vectors vary worked well for understanding the cur- 
vature of space curves. A possible generalization to surfaces is to examine 
the variation of the tangent planes. Since the direction of a plane is de- 
termined by the direction of its normal vector, we will define curvature 
functions by measuring the rate of the change in the normal vector. For 
example, for a plane az + by + cz = d, the normal at every point is the 
vector 


Diet 
Ut a 


<a,be>. 
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The normal vector is a constant; there is no variation in its direction. Once 
we have the correct definitions in place, this should provide us with the 
intuitively plausible idea that since the normal is not varying, the curvature 
must be zero. 

Denote a surface by 


X = {(a,y, 2): f(x,y, 2) = O}. 


Thus we are defining our surfaces implicitly, not parametrically. The normal 
vector at each point of the surface is the gradient of the defining function, 


i.e., 
Of Of Of 

Ox’ Oy’? Az" 

Since we are interested in how the direction of the normal is changing and 
not in how the length of the normal is changing (since this length can be 
easily altered without varying the original surface at all), we normalize the 
defining function f by requiring that the normal n at every point has length 
one: 


n=Vf= 


|n| = 1. 
We now have the following natural map: 


Definition 7.3.1 The Gauss map is the function 
o:X > S?, 
where S? is the unit sphere in R?, defined by 


o(p) = n(p) = Vf = (Li, 50) SL (py) 


As we move about on the surface X, the corresponding normal vector moves 
about on the sphere. To measure how this normal vector varies, we need 
to take the derivative of the vector-valued function ø and hence must look 
at the Jacobian of the Gauss map: 


do: TX > T9, 


where TX and T9? denote the respective tangent planes. If we choose 
orthonormal bases for both of the two dimensional vector spaces TX and 
TS’, we can write do as a two-by-two matrix, a matrix important enough 
to carry its own name: 


Definition 7.3.2 The two-by-two matrix associated to the Jacobian of the 
Gauss map is the Hessian. 
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While choosing different orthonormal bases for either TX and TS? will 
lead to a different Hessian matrix, it is the case that the eigenvalues, the 
trace and the determinant will remain constant (and are hence invariants 
of the Hessian). These invariants are what we concentrate on in studying 
curvature. 


Definition 7.3.3 For a surface X, the two eigenvalues of the Hessian are 
the principal curvatures. The determinant of the Hessian (equivalently 
the product of the principal curvatures) is the Gaussian curvature and the 
trace of the Hessian (equivalently the sum of the principal curvatures) is 
the mean curvature. 


We now want to see how to calculate these curvatures, in part in order 
to see if they agree with what our intuition demands. Luckily there is an 
easy algorithm that will do the trick. Start again with defining our surface 
X as {(x,y,z): f(x,y, z) = 0} such that the normal vector at each point 
has length one. Define the extended Hessian as 


2 Of /Oa2 8 f/Oxdy O° f /Axdz 
H = | O?f/dxdy OG f/dy? 8 f/dydz 
Of /dxdz PP f/OyOz Cf /dz? 


(Note that H does not usually have a name.) 
At a point p on X choose two orthonormal tangent vectors: 


ð ð ð 

Vy = p aa T aa by C1) 
ð ð 

v2 = @s +b +e = (az bz c2). 


Ox Oy Oz 


Orthonormal means that we require 
vi: vj = (ai bi ci) bj = Oy; 


where ĝ;; is zero for i Æ j and is one for i = j. Set 
ù% 
hij = (ai bi ci) H bj 


Cj 


Then a technical argument, heavily relying on the chain rule, will yield 
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Proposition 7.3.1 Coordinate systems can be chosen so that the Hessian 
matrix is the matrix H. Thus the principal curvatures for a surface X at 
a point p are the eigenvalues of the matriz 


hit h2 
H = 
o hoe 
and the Gaussian curvature is det(H) and the mean curvature is trace(H). 


We can now compute some examples. Start with a plane X given by 
(az + by +cz-—d=0). 


Since all of the second derivatives of the linear function ax + by + cz — d are 
zero, the extended Hessian is the three-by-three zero matrix, which means 
that the Hessian is the two-by-two zero matrix, which in turn means that 
the principal curvatures, the Gaussian and the mean curvature are all zero, 
as desired. 

Now suppose X = {(x,y,z) : s-(a? +y? + 2? —r?) = 0}, a sphere of 
radius r. 





The normal is the unit vector 


E 
and the extended Hessian is 
1 9 9 
ry 1 
H = 0 F 0 = -~ I 
0 oap 7 
Then given any two orthonormal vectors vı and v2, we have that 
„fhi 1 
hij = (ai b; ci) H bj. Spee ND 


Cj 
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and thus that the Hessian is the following diagonal matrix 
L 
nel si 
0 3 r 


The two principal curvatures are both x and are hence independent of which 
point is considered on the sphere, again agreeing with intuition. 
For the final example, let X be a cylinder : 


X = {(0,9,2): (0? +9? - 7?) =0}. 





Since the intersection of this cylinder with any plane parallel to the zy 
plane is a circle of radius r, we should suspect that one of the principal 
curvatures should be the curvature of a circle, namely i, But also through 
each point on the cylinder there is a straight line parallel to the z-axis, 
suggesting that the other principal curvature should be zero. We can now 
check these guesses. The extended Hessian is 


_ (40 0 
H=[0 2 0}. 
0 0 0 


We can choose orthonormal tangent vectors at each point of the cylinder 
of the form 
v=(a b 0) 


and 
vg = (0 0 Ly 


Then the Hessian is the diagonal matrix 


i 9 
#=(5 9): 
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meaning that one of the principal curvatures is indeed 1 and the other is 
0. 


7.4 The Gauss-Bonnet Theorem 


Curvature is not a topological invariant. A sphere and an ellipsoid are 
topologically equivalent (intuitively meaning that one can be continuously 
deformed into the other; technically meaning that there is a topological 
homeomorphism from one onto the other) but clearly the curvatures are 
different. But we can not alter curvature too much, or more accurately, 
if we make the appropriate curvature large near one point, it must be 
compensated for at other points. That is the essence of the Gauss-Bonnet 
Theorem, which we only state in this section. 

We restrict our attention to compact orientable surfaces, which are topo- 
logically spheres, toruses, two-holed toruses, three-holed toruses, etc. 


no 


The number of holes (called the genus g) is known to be the only topolog- 
ical invariant, meaning that if two surfaces have the same genus, they are 
topologically equivalent. 


Theorem 7.4.1 (Gauss-Bonnet) For a surface X, we have 
J Gaussian curvature = 2n(2 — 29). 
x 


Thus while the Gaussian curvature is not a local topological invariant, its 
average value on the surface is such an invariant. Note that the left-hand 
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side of the above equation involves analysis, while the right-hand side is 
topological. Equations of the form 


Analysis information = Topological information 


permeate modern mathematics, culminating in the Atiyah-Singer Index 
Formula from the mid 1960s (which has as a special case the Gauss-Bonnet 
Theorem). By now, it is assumed that if you have a local differential in- 
variant, there should be a corresponding global topological invariant. The 
work lies in finding the correspondences. 


7.5 Books 


The range in texts is immense. In part this is because the differential geom- 
etry of curves and surfaces is rooted in the nineteenth century while higher 
dimensional differential geometry usually has quite a twentieth century feel 
to it. Three long time popular introductions are by do Carmo [29], Mill- 
man and Parker [85] and O’Neil [91]. A recent innovative text, emphasizing 
geometric intuitions is by Henderson [56]. Alfred Gray [48] has written a 
long book built around Mathematica, a major software package for mathe- 
matical computations. This would be a good source to see how to do ache 
calculations. Thorpe’s text [111] is also interesting. 

McLeary’s Geometry from a Differentiable Viewpoint [84] has a lot of 
material in it, which is why it is also listed in the chapter on axiomatic 
geometry. Morgan [86] has written a short, readable account of Riemannian 
geometry. Then there are the classic texts. Spivak’s five volumes [102] 
are impressive, with the first volume a solid introduction. The bible of 
the 1960s and 70s is Foundations of Differential Geometry by Kobayashi 
and Nomizu [74]; though fading in fashion, I would still recommend all 
budding differential geometers to struggle with its two volumes, but not as 
an introductory text. 


7.6 Exercises 


1. Let C be the plane curve given by r(t) = (a(t), y(t)). Show that the 
curvature at any point is 
: z'y" — y'r" 
((x')? + (y’)?) 8/2 


(Note that the parametrization r(t) is not necessarily the arc length parametriza 
tion.) 
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2. Let C be the plane curve given by y = f(x). Show that a point p = 
(xo, yo) is a point of inflection if and only if the curvature at p is zero. (Note 
that p is a point of inflection if f’(2o) = 0.) 

. 3. For the surface described by 


2 
z= + Z, 
find the principal curvatures at each point. Sketch the surface. Does the 
sketch provide the same intuitions as the principal curvature calculations? 
4, Consider the cone 
z =g + y’. 
Find the image of the Gauss map. (Note that you need to make sure that 
the normal vector has length one.) What does this image have to say about 
the principal curvatures? 


5. Let 
A(t) = (a1 (t), a2(t), a3(t)) 


and 
B(t) = (b1 (t), b2 (t), ba (t)) 
be two 3-tuples of differentiable functions. Show that 


<(A)-BO) = Boao S 


Chapter 8 


Geometry 













Points and Lines in Planes 
Axioms for Different Geometries 


Basic Objects: 
Basic Goal: 


The axiomatic geometry of Euclid was the model for correct reasoning from 
at least as early as 300 BC to the mid 1800s. Here was a system of thought 
that started with basic definitions and axioms and then proceeded to prove 
theorem after theorem about geometry, all done without any empirical in- 
put. It was believed that Euclidean geometry correctly described the space 
that we live in. Pure thought seemingly told us about the physical world, 
which is a heady idea for mathematicians. But by the early 1800s, non- 
Euclidean geometries had been discovered, culminating in the early 1900s 
in the special and general theory of relativity, by which time it became 
clear that, since there are various types of geometry, the type of geometry 
that describes our universe is an empirical question. Pure thought can tell 
us the possibilities but does not appear able to pick out the correct one. 
(For a popular account of this development by a fine mathematician and 
mathematical gadfly, see Kline’s Mathematics and the Search for Knowledge 
[73].) 

Euclid started with basic definitions and attempted to give definitions 
for his terms. Today, this is viewed as a false start. An axiomatic system 
starts with a collection of undefined terms and a collection of relations (ax- 
ioms) among these undefined terms. We can then prove theorems based 
on these axioms. An axiomatic system “works” if no contradictions occur. 
Hyperbolic and elliptic geometries were taken seriously when it was shown 
that any possible contradiction in them could be translated back into a con- 
tradiction in Euclidean geometry, which no one seriously believes contains 
a contradiction. This will be discussed in the appropriate sections of this 
chapter. 
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8.1 Euclidean Geometry 


Euclid starts with twenty-three Definitions, five Postulates and five Com- 

mon Notions. We will give a flavor of his language by giving a few examples 

of each (following Heath’s translation of Euclid’s Elements [32]; another ex- 

cellent source is in Cederberg’s A Course in Modern Geometries [17]). 
For example, here is Euclid’s definition of a line: 


A line is breadthless length 
and for a surface: 
A surface is that which has length and breadth only. 


While these definitions do agree with our intuitions of what these words 
should mean, to modern ears they sound vague. 

His five Postulates would today be called axioms. They set up the basic 
assumptions for his geometry. For example, his fourth postulate states: 


That all right angles are equal to one another. 


Finally, his five Common Notions are basic assumptions about equalities. 
For example, his third common notion is 


If equals be subtracted from equals, the remainders are equal. 


All of these are straightforward, except for the infamous fifth postulate. 
This postulate has a different feel than the rest of Euclid’s beginnings. 


Fifth Postulate: That, if a straight line falling on two straight lines makes 
the interior angles on the same side less than two right angles, the two 
straight lines, if produced indefinitely, meet on that side on which are the 
angles less than the two right angles. 


Certainly by looking at the picture 






necessary point 
J of intersection 





interior 
angles 
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we see that this is a perfectly reasonable statement. We would be surprised 
if this were not true. What is troubling is that this is a basic assumption. 
Axioms should not be just reasonable but obvious. This is not obvious. 
It is also much more complicated than the other postulates, even in the 
superficial way that its statement requires a lot more words than the other 
postulates. In part, it is making an assumption about the infinite, as it 
states that if you extend lines further out, there will be an intersection 
point. A feeling of uneasiness was shared by mathematicians, starting with 
Euclid himself, who tried to use this postulate as little as possible. 

One possible approach is to replace this postulate with another one that 
is more appealing, turning this troubling postulate into a theorem. There 
are a number of statements equivalent to the fifth postulate, but none that 
really do the trick. Probably the most popular is Playfair’s Axiom: 

Given a point off of a line, there is a unique line through the point 
parallel to the given line. 


point p 


Me ee ee ere m 
unique line parallel 
to | through p 


line | 


Certainly a reasonable statement. Still, it is quite bold to make this a basic 
assumption. It would be ideal if the fifth postulate could be shown to be 
a statement provable from the other axioms. The development of other 
geometries stemmed from the failed attempts in trying to prove the fifth 
postulate. 


8.2 Hyperbolic Geometry 


One method for showing that the fifth postulate must follow from the other 
axioms is to assume it is false and find a contradiction. Using Playfair’s 
Axiom, there are two possibilities: either there are no lines through the 
point parallel to the given line or there are more than one line through the 
point parallel to the given line. These assumptions now go by the names: 


Elliptic Axiom: Given a point off of a given line, there are no lines through 
the point parallel to the line. 


This is actually just making the claim that there are no parallel lines, 
or that every two lines must intersect (which again seems absurd). 
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Hyperbolic Axiom: Given a point off of a given line, there is more than 
one line through the point parallel to the line. 


What is meant by parallel must be clarified. Two lines are defined to 
be parallel if they do not intersect. 

Geroloamo Saccheri (1667-1773) was the first to try to find a contra- 
diction from the assumption that the fifth postulate is false. He quickly 
showed that if there is no such parallel line, then contradictions occurred. 
But when he assumed the Hyperbolic Axiom, no contradictions arose. Un- 
fortunately for Saccheri, he thought that he had found such a contradiction 
and wrote a book, Euclides ab Omni Naevo Vindicatus (Euclid Vindicated 
from all Faults), that claimed to prove that Euclid was right. 

Gauss (1777-1855) also thought about this problem and seems to have 
realized that by negating the fifth postulate, other geometries would arise. 
But he never mentioned this work to anybody and did not publish his 
results. 

It was Lobatchevsky (1793-1856) and Janos Bolyai (1802-1860) who, 
independently, developed the first non-Euclidean geometry, now called hy- 
perbolic geometry. Both showed, like Saccheri, that the Elliptic Axiom was 
not consistent with the other axioms of Euclid, and both showed, again like 
Saccheri, that the Hyperbolic Axiom did not appear to contradict the other 
axioms. Unlike Saccheri though, both confidently published their work and 
did not deign to find a fake contradiction. 

Of course, just because you prove a lot of results and do not come up 
with a contradiction does not mean that a contradiction will not occur the 
next day. In other words, Bolyai and Lobatchevsky did not have a proof 
of consistency, a proof that no contradictions could ever occur. Felix Klein 
(1849-1925) is the main figure for finding models for different geometries 
that would allow for proofs of consistency, though the model we will look 
at was developed by Poincaré (1854-1912). 

Thus the problem is how to show that a given collection of axioms forms 
a consistent theory, meaning that no contradiction can ever arise. The 
model approach will not show that hyperbolic geometry is consistent but in- 
stead show that it is as consistent as Euclidean geometry. The method is to 
model the straight lines of hyperbolic geometry as half circles in Euclidean 
geometry. Then each axiom of hyperbolic geometry will be a theorem of 
Euclidean geometry. The process can be reversed, so that each axiom of 
Euclidean geometry will become a theorem in hyperbolic geometry. Thus, 
if there is some hidden contradiction in hyperbolic geometry, there must 
also be a hidden contradiction in Euclidean geometry (a contradiction that 
no one believes to exist). 

Now for the details of the model. Start with the upper half plane 
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H = {(z,y) € R : y > 0}. 





tl 


3 -2 4 0 1 2 3 4 


Our points will be simply the points in H. The key to our model of hy- 
perbolic geometry is how we define straight lines. We say that a line is 
either a vertical line in H or a half-circle in H that intersects the x-axis 
perpendicularly. 


line 


line 


To see that this is indeed a model for hyperbolic geometry we would have 
to check each of the axioms. For example, we would need to check that 
between any two points there is a unique line (or in this case, show that 
for any two points in H, there is either a vertical line between them or a 
unique half-circle between them). 


unique line through 
p and q 





The main thing to see is that for this model the Hyperbolic Axiom is 
obviously true. 
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What this model allows us to do is to translate each axiom of hyperbolic ge- 
ometry into a theorem in Euclidean geometry. Thus the axioms about lines 
in hyperbolic geometry become theorems about half-circles in Euclidean 
geometry. Therefore, hyperbolic geometry is as consistent as Euclidean 
geometry. 

Further, this model shows that the fifth postulate can be assumed to 
be either true or false; this means that the fifth postulate is independent of 
the other axioms. 


8.3. Elliptic Geometry 


But what if we assume the Elliptic Axiom. Saccheri, Gauss, Bolyai and 
Lobatchevsky all showed that this new axiom was inconsistent with the 
other axioms. Could we, though, alter these other axioms to come up with 
another new geometry. Riemann (1826-1866) did precisely this, showing 
that there were two ways of altering the other axioms and thus that there 
were two new geometries, today called single elliptic geometry and double 
elliptic geometry (named by Klein). For both, Klein developed models and 
thus showed that both are as consistent as Euclidean geometry. 

In Euclidean geometry, any two distinct points are on a unique line. 
Also in Euclidean geometry, a line must separate the plane, meaning that 
given any line J, there are at least two points off of J such that the line 
segment connecting the two points must intersect l. 

For single elliptic geometry, we assume that a line does not separate the 
plane, in addition to the Elliptic Axiom. We keep the Euclidean assumption 
that any two points uniquely determine a line. For double elliptic geometry,, 
we need to assume that two points can lie on more than one line, but now 
keep the Euclidean assumption that a line will separate the plane. All of 
these sound absurd if you are thinking of straight lines as the straight lines 
from childhood. But under the models that Klein developed, they make 
sense, as we will now see. 

For double elliptic geometry, our “plane” is the the unit sphere, the 
points are the points on the sphere and our “lines” will be the great circles 


& 
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on the spheres. (The great circles are just the circles on the sphere with 
greatest diameter.) 





Note that any two lines will intersect (thus satisfying the Elliptic Ax- 
iom) and that while most pairs of points will uniquely define a line, points 
opposite to each other will lie on infinitely many lines. Thus statements 
about lines in double elliptic geometry will correspond to statements about 
great circles in Euclidean geometry. 

For single elliptic geometry, the model is a touch more complicated. Our 
“plane” will now be the upper half-sphere, with points on the boundary 
circle identified with their antipodal points, i.e., 


{(z,y,z): 2% +y? +27 =1,2 > 0}/{(z, y, 0) is identified with (—x, —y,0)}. 


line 


Cy 


line 


Thus the point on the boundary (-., —-4, 0) is identified with the point 
V2? V2 


(=z J 0). Our “lines” will be the great half-circles on the half-sphere. 
Note that the Elliptic Axiom is satisfied. Further, note that no line will 
separate the plane, since antipodal points on the boundary are identified. 
Thus statements in single elliptic geometry will correspond to statements 
about great half-circles in Euclidean geometry. 


8.4 Curvature 


One of the most basic results in Euclidean geometry is that the sum of the 
angles of a triangle is 180 degrees, or in other words, the sum of two right 
angles. 
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Recall the proof. Given a triangle with vertices P,Q and R, by Playfair’s 
Axiom there is a unique line through R parallel to the line spanned by P 
and Q. By results on alternating angles, we see that the the angles a, 8 
and y must sum to that of two right angles. 





Note that we needed to use Playfair’s axiom. Thus this result will not 
necessarily be true in non-Euclidean geometries. This seems reasonable if 
we look at the picture of a triangle in the hyperbolic upper half-plane and 
of a triangle on the sphere of double elliptic geometry. 


<> 6% 
Lee OEY 


What happens is that in hyperbolic geometry the sums of the angles of 
a triangle are less than 180 degrees while, for elliptic geometries, the sum 
of the angles of a triangle will be greater than 180 degrees. It can be 
shown that the smaller that the area of the triangle is, the closer the sum 
of the triangle’s angles will be to 180 degrees. This in turn is linked to 
the Gaussian curvature. It is the case (though it is not obvious) that 
methods of measuring distance (i.e., metrics) can be chosen so that the 
different types of geometry will have different Gaussian curvatures. More 
precisely, the Gaussian curvature of the Euclidean plane will be zero, of 
the hyperbolic plane will be —1 and of the elliptic planes will be 1. Thus 
differential geometry and curvature are linked to the axiomatics of different 
geometries. 


--4--- 
-1 


8.5 Books 


One of the best popular books in mathematics of all time is Hilbert and 
Cohn-Vossens’ Geometry and the Imagination [58]. All serious students 
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should study this book carefully. One of the 1900s best geometers (someone 
who actually researched in areas that nonmathematicians would recognize 
as geometry), Coxeter, wrote a great book, Introduction to Geometry [23]. 
More standard, straightforward texts on various types of geometry are by 
Gans [44], Cederberg [17] and Lang and Murrow [81] . Robin Hartshorne’s 
Geometry: Euclid and Beyond [55] is an interesting recent book. Also, 
McLeary’s Geometry from a Differentiable Viewpoint [84] is a place to see 
both non-Euclidean geometries and the beginnings of differential geometry. 


8.6 Exercises 


1. This problem gives another model for hyperbolic geometry. Our points 
will be the points in the open disc: 


D={(a,y): a7 +y? <1}. 


The lines will be the arcs of circles that intersect perpendicularly the bound- 
ary of D. Show that this model satisfies the Hyperbolic Axiom. 


2. Show that the model in problem 1 and the upper half plane model are 
equivalent, if, in the upper half plane, we identify all points at infinity to a 
single point. 
3. Give the analogue of Playfair’s Axiom for planes in space. 
4. Develop the idea of the upper half space so that if P is a “plane” and p 
is a point off of this plane, then there are infinitely many planes containing 
p that do not intersect the plane P. 
5. Here is another model for single elliptic geometry. Start with the unit 
disc 

D = {(z,y) : 2? +y” <1}. 
Identify antipodal points on the boundary. Thus identify the point (a,b) 


with the point (—a, —b), provided that a? + b? = 1. Our points will be the 
points of the disc, subject to this identification on the boundary. 
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(a,b) 


(-a,-b) 


Lines will in this model be Euclidean lines, provided they start and end at 
antipodal points. Show that this model describes a single elliptic geometry. 
6. Here is still another model for single elliptic geometry. Let our points 
be lines through the origin in space. Our lines in this geometry will be 
planes through the origin in space. (Note that two lines through the origin 
do indeed span a unique plane.) Show that this model describes a single 
elliptic geometry. 

7. By looking at how a line through the origin in space intersects the top 
half of the unit sphere 


{(2,y, 2): 2%? +y? + 2? = land z > 0}, 


show that the model given in problem 6 is equivalent to the model for single 
elliptic geometry given in the text. 


Chapter 9 


Complex Analysis 


Basic Object: The complex numbers 
Basic Map: Analytic functions 





Basic Goal: Equivalences of analytic functions 


Complex analysis in one variable studies a special type of function (called 
analytic or holomorphic) mapping complex numbers to themselves. There 
are a number of seemingly unrelated but equivalent ways for defining an 
analytic function. Each has its advantages; all should be known. 

We will first define analyticity in terms of a limit (in direct analogy 
with the definition of a derivative for a real-valued function). We will then 
see that this limit definition can also be captured by the Cauchy-Riemann 
equations, an amazing set of partial differential equations. Analyticity will 
then be described in terms of relating the function with a particular path 
integral (the Cauchy Integral Formula). Even further, we will see that a 
function is analytic if and only if it can be locally written in terms of a 
convergent power series. We will then see that an analytic function, viewed 
as a map from R? to R?, must preserve angles (which is what the term 
conformal means), provided that the function has a nonzero derivative. 
Thus our goal is: 


Theorem 9.0.1 Let f: U — C be a function from an open set U of the 
complex numbers to the complex numbers. The function f(z) is said to be 
analytic if it satisfies any of the following equivalent conditions: 

a) For all zo E€ U, 


lim f(z) — f(z) 


z—> z0 z — zo 


exists. This limit is denoted by f'(zọ) and is called the complex derivative. 
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b) The real and imaginary parts of the function f satisfy the Cauchy- 


Riemann equations: 
ORe(f) _ OIm/(f) 
ðr y 
and 
ORe(f) __ 0Im(f) 
ðy Oz ` 
c) Leto be a counterclockwise simple loop in U such that every interior 
point of o is also in U. If z is any complex number in the interior of o, 


then 
f (Zo) = = F(z) dz. 


amt J 2 — Zo 





d) For any complex number zo, there is an open neighborhood in U of 
zo in which 


F(z) = > an(z — 20)", 
n=0 


a uniformly converging series. 
Further, if f is analytic at a point z and if f'(zo) #0, then at zo, the 
function f is conformal (i.e., angle-preserving), viewed as a map from R? 
to R?. 


There is a basic distinction between real and complex analysis. |Real 
analysis studies, in essence, differentiable functions; this is not a major re- 
striction on functions at all. Complex analysis studies analytic functions; 
this is a major restriction on the type of functions studied, leading to the 
fact that analytic functions have many amazing and useful properties. An- 
alytic functions appear throughout modern mathematics and physics, with 
applications ranging from the deepest properties of prime numbers to the 
subtlety of fluid flow. Know this subject well. 


9.1 Analyticity as a Limit 


For the rest of this chapter, let UV denote an open set of the complex numbers 
C. 

Let f : U — C be a function from our open set U of the complex 
numbers to the complex numbers. 


Definition 9.1.1 At a point zo € U, the function f(z) is analytic (or 
holomorphic) if 


fim £2) — fle) 


z= zo a 0 
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exists. This limit is denoted by f'(zo) and is called the derivative. 


Of course, this is equivalent to the limit 


fim £20 +4) - fo) 


h-0 h 


existing for h € C. 

Note that this is exactly the definition for a function f : R — R to 
be differentiable if all C’s are replaced by R’s. Many basic properties 
of differentiable functions (such as the product rule, sum rule, quotient 
rule, and chain rule) will immediately apply. Hence, from this perspective, 
there does not appear to be anything particularly special about analytic 
functions. But the involved limits are not limits on the real line but limits in 
the real plane. This extra complexity creates profound distinctions between 
real differentiable functions and complex analytic ones, as we will see. 

Our next task is to give an example of a nonholomorphic function. We 
need a little notation. The complex numbers C form a real two dimensional 
vector space. More concretely, each complex number z can be written as 
the sum of a real and imaginary part: 


z= +iy. 





The complex conjugate of z is 
z=g-iy. 
Note that the square of the length of the complex number z as a vector in 
R? is 
r + y? = 22. 
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Keeping in tune with this notion of length, the product zZ is frequently 
denoted by: 
2k = |z|?. 
Fix the function 
f(z) =Z=2-Wy. 

We will see that this function is not holomorphic. The key is that in the 
definition we look at the limit as h > 0 but A must be allowed to be any 
complex number. Then we must allow h to approach 0 along any path in 
C, or in other words, along any path in R?. We will take the limit along 
two different paths and see that we get two different limits, meaning that 
Z is not holomorphic. 

For convenience, let zọ = 0. Let h be real valued. Then for this h we 


we fh) = $0) 

ED h0 TARS" 
Now let h be imaginary, which we label, with an abuse of notation, by hi, 
with A now real. Then the limit will be: 


_ f(hi)—f(0) o hi 
n B a 


Since the two limits are not equal, the function Z cannot be a holomorphic 
function. 


9.2  Cauchy-Riemann Equations 


For a function f : U > C, we can split the image of f into its real and 
imaginary parts. Then, using that 


z=% + iy = (x,y), 
we can write f(z) = u(z) + iv(z) as 


f(x,y) = u(x, y) + tv(a,y). 
For example, if f(z) = z?, we have 


f@ = # 
= («+iy)? 
= g? — y? +2ryi. 


Then the real and imaginary parts of the function f will be: 


ti Dy) Se a? 
v(z,y) = 2zy. 
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The goal of this section is to capture the analyticity of the function f by 
having the real-valued functions u and v satisfy a special system of partial 
differential equations. 


Definition 9.2.1 Real-valued functions u,v: U - R. satisfy the Cauchy- 
Riemann equations if 


du(z,y) _ Ov(z,y) 
ðr ð 


and 


ðu(x,y) __Əv(z,y) 
Oy ðs ` 


Though not at all obvious, this is the most important system of partial 
differential equations in all of mathematics, due to its intimate connection 
with analyticity, described in the following theorem. 


Theorem 9.2.1 A complez-valued function f(x,y) = u(z,y) + iv(z,y) is 
analytic at a point zo = zo + iyo if and only if the real-valued functions 
u(z,y) and v(2,y) satisfy the Cauchy-Riemann equations at zo. 


We will show that analyticity implies the Cauchy-Riemann equations 
and then that the Cauchy-Riemann equations, coupled with the condition 
that the partial derivatives gu gu ou and ge are continuous, imply analyt- 
icity. This extra assumption requiring the continuity of the various partials 


is not needed, but without it the proof is quite a bit harder. 


Proof: We first assume that at a point z = ro + iyo, 


lim f(zo + h) — f (z0) 
h—0 h 


exists, with the limit denoted as usual by f’(zo). The key is that the number 
his a complex number. Thus when we require the above limit to exist as 
h approaches zero, the limit must exist along any path in the plane for h 
approaching zero. 
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ae 


possible paths to Zp 


The Cauchy-Riemann equations will follow by choosing different paths for 
h. 
First, assume that h is real. Then 


F(zo +h) = f(zo + h, y) = u(zo + h, y) + iv(zo + h, y). 


By the definition of analytic function, 


Wa) = Jim Lot) — Fo) 

fio) = lim = 

= jim #0 tvo) + iv(zo + h, vo) — (u(zo, yo) + iv(£o, Yo)) 
= h-0 h 

E lim Wheat hs Yo) — u(t, Yo) |; jim 22t hvo) ~ vlzo vo) 


h-0 h h-0 h 
Ou Ov 
ae By 610140) + 15 (0; Yo), 


by the definition of partial derivatives. 
Now assume that h is always purely imaginary. For ease of notation we 
denote h by hi, h now real. Then 


f (zo + hi) = f(x0, yo + h) = ulto, yo + h) + iv(zo, yo + hr). 
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We have, for the same complex number f’(zo) as before, 


lim f(zo +ih) — f(z) 


! z 
fo) = h>0 ih 
a u(2o, yo +h) + iv(z0, yo + h) — (u(zo, yo) + iv(xo0, yo)) 
~ hao ih | 
= l lim u(zo, Yo + h) — u(zo, yo) ki v(zo, yo + h) — v(z0, Y0) 
i h=0 h h0 h 
Ou Ov 
= igy >w) + By (70240): 
by the definition of partial differentiation and since 1 = —i, 
But these two limits are both equal to the same complex number f'(zo). 
Hence 
ðu + 22 = 2t + ðv 
ðs Ox y Oy 
Since gu w, an and ge are all real-valued functions, we must have 
du _ av 
Ox Oy 
ðu _dv 
Oy — ðr’ 


the Cauchy-Riemann equations. 

Before we can prove that the Cauchy-Riemann equations (plus the extra 
assumption of continuity on the partial derivatives) imply that f(z) is an- 
alytic, we need to describe how complex multiplication can be interpreted 
as a linear map from R° to R? (and hence as a 2 x 2 matrix). 

Fix a complex number a+bi. Then for any other complex number z+iy, 
we have 

(a + bi)(x + iy) = (ax — by) +i(ay + bz). 


Representing x + iy as a vector (*) in R”, we see that multiplication by 


a+ i corresponds to the matrix multiplication 


a —b r\ fax — by 
boa y) \br+ay/]° 
As can be seen, not all linear transformations (é2) : R? > R? correspond 


to multiplication by a complex number. In fact, from the above we have 


(én) 


Lemma 9.2.1 The matriz 
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corresponds to multiplication by a complex number a + bi if and only if 
A=D=aandB=-C=-4. 


Now we can return to the other direction of the theorem. First write 
our function f : C — C as a map f : R? > R? by 


u(z, À 
z, y) = À 
eu ie y) 
As described in Chapter Three, the Jacobian of f is the unique matrix 


Df= ( J= (Z0, Y0) S= (£0, Y0) ) 


Se (æo,yo) (z0, yo) 


satisfying 
| ( u(x, y) ) = ( u(Zo; Yo) ) - Df. ( qT — To ) | 
‘ v(x, y) u(Xo, yo) yY — Yo 
lim => eee: 
zzo |(z — zo, y — yo)| 
But the Cauchy-Riemann equations, ĝu = = ge Z and 2u = — 22, tell us that 


this Jacobian represents multiplication by a complex number. Call this 
complex number f'(zo). Then, using that z = z + iy and zo = zo + iyo, we 
can rewrite the above limit as 


lim | f(z) — fou = Lat — 20) | _ 0. 
z—z9 Z — Zo 


This must also hold without the absolute value signs and hence 


lim f(z) — fo) — F'(z0)(2 = 20) 


ZZ Z — zo 


= lim f(z) — f (20) — f'(z0). 


z=} Z0 a= 20 
Thus 
. f(z) — f (20) 
1 — 
f (zo) = lin -“—“—~ — i 


will always exist, meaning that the function f : C — C is analytic. O 
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9.3 Integral Representations of Functions 


Analytic functions can also be defined in terms of path integrals about 
closed loops in C. This means that we will be writing analytic functions as 
integrals, which is what is meant by the term integral representation. We 
will see that for a closed loop a, 


the values of an analytic function on interior points are determined from the 
values of the function on the boundary, which places strong restrictions on 
what analytic functions can be. The consequences of this integral represen- 
tation of analytic functions range from the beginnings of homology theory 
to the calculation of difficult real-valued integrals (using residue theorems). 

We first need some preliminaries on path integrals and Green’s Theorem. 
Let ø be a path in our open set U. In other words, ø is the image of a 
differentiable map 


c : [0,1] > U. 


o{1)=(x(1).y(1)) 







0)=(x(0},y(0 
e o0) (0) eae 


Writing o(t) = (z(t), y(t)), with z denoting the real coordinate of C and y 
the imaginary coordinate, we have: 


Definition 9.3.1 If P(x,y) and Q(z, y) are real-valued functions defined 
on an open subset U of R? =C, then 


[rasou f Pee t S QUOA) at. 
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If f: U — C is a function written as 


f(z) = f(@,y) = u(x, y) + iv(z,y) = u(z) + iv(z), 
then 
Definition 9.3.2 The path integral f, f(z)dz is defined by 


f toaz = f (Uen) + iole, y) ae + idy) 
| Uen tivna f (iu(x,y) — v(x, y)) dy. 


The goal of this section is to see that these path integrals have a number 
of special properties when the function f is analytic. 

A path a is a closed loop in U if there is a parametrization o : [0,1] + U 
with o(0) = o(1). 


o(0)=0(1) 


Note that we are using the same symbol for the actual path and for the 
parametrization function. The loop is simple if o(t)4o(s), for all st, 
except for when t or s is zero or one. 


simple 
not simple 


We will require all of our simple loops to be parametrized so that they are 
counterclockwise around their interior. For example, the unit circle is a 
counterclockwise simple loop, with parametrization 


a(t) = (cos(2zt), sin(2zt)). 
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o(t)}=(cos(27t) sin(2m1)) 
oe 


We will be interested in the path integrals of analytic functions around 
counterclockwise simple loops. Luckily, there are two key, easy examples 
that demonstrate the general results. Both of these examples will be in- 
tegrals about the unit circle. Consider the function f : C — C defined 
by 

f(z)=z=ax+ iy. 


Then 


[few = [x 


f C TE T 


fe + iy)dz + [i — y)dy 


1 
f (cos(2rt) + i sin(2nt)) cos(2zt)dt 
0 


1 
+ I (i cos(2rt) — sin(2nt)) sin(2rt)dt 
0 
= 0, 
when the integral is worked out. 


On the other hand, consider the function f(z) = 3. On the unit circle 
we have |z|? = zZ = 1 and hence 4 = Z. Then 


[ sea = Z = [ = f (costent) — isin(27t)) (dz + idy) 


= 2ri, 
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when the calculation is performed. We will soon see that the reason that the 
path integral Í, dz equals 2ri for the unit circle is that the function Ł is not 


well-defined in the interior of the circle (namely at the origin). Otherwise 
the integral would be zero, as in the first example. Again, though not at 
all apparent, these are the two key examples. 

The following theorems will show that the path integral of an analytic 
function about a closed loop will always be zero if the function is also 
analytic on the interior of the loop. 

We will need, though, Green’s. Theorem: 


Theorem 9.3.1 (Green’s Theorem) Leto be a counterclockwise simple 
loop in C and Q its interior. If P(x,y) and Q(x,y) are two real-valued 
differentiable functions, then 


[rac+ou=f f (-F) dady. 


The proof is exercise 5 in Chapter Five. 
Now on to Cauchy’s Theorem: 


Theorem 9.3.2 (Cauchy’s Theorem) Leto be a counterclockwise sim- 
ple loop in an open set U such that every point in the interior of o is 
contained in U. If f: U + C is an analytic function, then 


[ fea = 0, 


Viewing the path integral f f(z)dz as some sort of average of the values 
of f(z) along the loop o, this theorem is stating the average value is zero 
for an analytic f. By the way, this theorem is spectacularly false for most 
functions, showing that those that are analytic are quite special. 
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Proof: (under the additional hypothesis, which can be removed with some 
work, that the complex derivative f'(z) is continuous). 

Write f(z) = u(z) + iv(z), with u(z) and v(z) real-valued functions. 
Since f(z) is analytic we know that the Cauchy-Riemann equations hold: 


du _ dv 
Ox Oy 
and 
_du ðv 
ðy Ox’ 
Now 
f toaz = fo +iv)(dz + idy) 


[ (uae —vdy) +1 fow + vdz) 


z [LC n — Fe) anay | f (Fe - $2) ave, 


by Green’s Theorem, where as before 9 denotes the interior of the closed 
loop ø. But this path integral must be zero by the Cauchy-Riemann equa- 
tions. O 

Note that while the actual proof of Cauchy’s Theorem was short, it used 
two major earlier results, namely the equivalence of the Cauchy-Riemann 
equations with analyticity and Green’s Theorem. 

This theorem is at the heart of all integral-type properties for analytic 
functions. For example, this theorem leads (nontrivially) to the following, 
which we will not prove: 


Theorem 9.3.3 Let f : U — C be analytic in an open set U and let o 
and & be two simple loops so that a can be continuously deformed to & in 
U (i.e., o and & are homotopic in U). Then 


[ teraz = f feaz 


Intuitively, two loops are homotopic in a region U if one can be continuously 
deformed into the other within U. Thus 
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gı and cz are homotopic to each other in the region U but not to a3 in this 
region (though all three are homotopic to each other in C). The technical 
definition is: 


Definition 9.3.3 Two paths cı and oz are homotopic in a region U if 
there is a continuous map 


T : [0,1] x [0,1] > U 


with 
T(t,0) = o1(t) 


and 
T(t, 1) = oo(t). 


O1(H=T(t,0) 


Oa(t)}=T(1,1) 


© 


T(t) 


In the statement of Cauchy’s Theorem, the requirement that all of the 
points in the interior of the closed loop ø be in the open set U can be 
restated as requiring that the loop ø is homotopic to a point in U. 

We also need the notion of simply connected. A set U in C is simply 
connected if every closed loop in U is homotopic in U to a single point. 
Intuitively, U is simply connected if U contains the interior points of every 
closed loop in U. For example, the complex numbers C is simply connected, 
but C—(0,0) is not simply connected, since C—(0,0) does not contain the 
unit disc, even though it does contain the unit circle. 
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We will soon need the following slight generalization of Cauchy’s The- 
orem: 


Proposition 9.3.1 Let U be a simply connected open set in C. Let f: 
U > C be analytic except possibly at a point zp but continuous everywhere. 
Let o be any counterclockwise simple loop in U. Then 


i onei 


The proof is similar to that of Cauchy’s Theorem; the extension is that we 
have to guarantee that all still works even if the point zo lies on the loop ø. 
All of these lead to: 


Theorem 9.3.4 (Cauchy Integral Formula) Let f : U => C be ana- 
lytic on a simply connected open set U in C and leto be a counterclockwise 
simple loop in U. Then for any point zo in the interior of o, we have 


t) =z | 


=> dz. 
2ri Jo Z — zo 





The meaning of this theorem is that the value of the analytic function f at 
any point in the interior of a region can be obtained by knowing the values 
of f on the boundary curve. 

Proof: Define a new function g(z) by setting 


g(z) = fle) = Flo) 


zZz — zo 


when z # zo and setting 
g(z) = f'(z0) 


when z = zo. 
Since f(z) is analytic at zo, by definition we have 


f' (zo) _ lim f(z) rat Flo) 
z= zo Z — zo 
meaning that the new function g(z) is continuous everywhere and analytic 
everywhere except for possibly at zo. 


Then by the last theorem we have f. g(z)dz = 0. Thus 


ya f(z) ~ f(zo) 4, = f(z) de f (zo) dz. 


o 2—20 oe 27% o 27% 
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Then 
f(z) dz = f (zo) d 


os 2-20 o Z— Zo 


Fo) | o, 


since f(zo) is just a fixed complex number. But this path integral is just 
our desired 27if (zo), by direct calculation, after deforming our simple loop 
a to a circle centered at zp. D 

In fact, the converse is also true. 


Theorem 9.3.5 Let o be a counterclockwise simple loop and f:o0 ~> C 
any continuous function on the loop o. Extend the function f to the interior 
of the loop o by setting 


2 FO) a 
P(20) = z etn Bx 
for points z in the interior. Then He) is analytic on the interior of o. 
Further, f is infinitely differentiable with 
k! z 
Feo f g 8 


dz. 
2ri Jo (2 — zo) t 7 


Though a general proof is in most books on complex analysis, we will 
only sketch why the derivative f’(zo) is capable of being written as the path 
integral 


: _ F(z) _ dz. 


2ni J, (z— 20)? 


For ease of notation, we write 
1 
j= -— i: fw) 4 
2ri J W— Z 


d 
qf) 


= cla / fr dw) 
= = (£2) ae 
a ae 


Then 


II 


f(z) 
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as desired. 

Note that in this theorem we are not assuming that the original function 
f:o—-C was analytic. In fact the theorem is saying that any continuous 
function on a simple loop can be used to define an analytic function on the 
interior. The reason that this can only be called a sketch of a proof was 
that we did not justify the pulling of the derivative £ inside of the integral. 


9.4 Analytic Functions as Power Series 


Polynomials anz” +an-1z2”7! +: - -+a are great functions to work with. In 
particular they are easy to differentiate and to integrate. Life would be easy 
if all we ever had to be concerned with were polynomials. But this is not 
the case. Even basic functions such as e*, log(z) and the trig functions are 
just not polynomials. Luckily though, all of these functions are analytic, 
which we will see in this section means that they are almost polynomials, 
or more accurately, glorified polynomials, which go by the more common 
name as power series. In particular the goal of this section is to prove: 


Theorem 9.4.1 Let U be an open set in C. A function f : U => C is 
analytic at zo if and only if in a neighborhood of zo, f(z) is equal to a 
uniformly convergent power series, i.e., 


f(z)= 5y an(z — z0)”. 
n=0 


Few functions are equal to uniformly convergent power series (these “glo- 
rified polynomials”). Thus we will be indeed showing that an analytic 
function can be described as such a glorified polynomial. 

Note that if 


J an(z— zo)” 


n=0 
ao + ai(z — 20) + d2(z— 2)? +, 


f(z) 


we have that 


f (zo) = 40, 
f(z) = a, 
f(z) = 2a, 


fH) (zo) = k!ag. 
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Thus, if f(z) = [P29 an(z — 20)”, we have 
(n) 
oe f Pa), (z = 20)", 


the function’s Taylor series. In other words, the above theorem is simply 
stating that an analytic function is equal to its Taylor series. 

We first show that any uniformly convergent power series defines an 
analytic function by reviewing quickly some basic facts about power series 
and then sketching a proof. 

Recall the definition of uniform convergence, given in Chapter Three. 


Definition 9.4.1 Let U be a subset of the complex numbers C. A sequence 
of functions, fn : A > C, converges uniformly to a function f: U => C if 
given any € > 0, there is some positive integer N such that for alln > N, 


lfn(z) — fl@)l<e, 
for all points z in U. 


In other words, we are guaranteed that eventually all the functions f,(z) 
will fall within any e-tube about the limit function f(z). 

The importance of uniform convergence for us is the following theorem, 
which we will not prove here: 


Theorem 9.4.2 Let the sequence {f,(z)} of analytic functions converge 
uniformly on an open set U to a function f : U > C. Then the function 
f(z) is also analytic and the sequence of derivatives (f1 (z)) will converge 
pointwise to the derivative f'(z) on the set U. 


Now that we have a definition for a sequence of functions to converge 
uniformly, we can make sense out of what it would mean for a series of func- 
tions to converge uniformly, via translating series statements into sequence 
statements using the partial sums of the series. 


Definition 9.4.2 A series >.) Gn(z—20)", for complex numbers a, and 
Zo, converges uniformly in an open set U of the complex numbers C if the 
sequence of polynomials a o an (z — zo)” } converges uniformly in U. 


By the above theorem and since polynomials are analytic, we can con- 
clude that if 


Fl) = D7 an(z — 20)” 
n=0 


is a uniformly convergent series, then the function f(z) is analytic. 
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Now to sketch why any analytic function can be written as a uniformly 
convergent power series. The Cauchy Integral Formula from last section 
will be critical. 


Start with a function f which is analytic about a point zp. Choose a 
simple loop o about zo. By the Cauchy Integral Formula, 





1 fw) 
fO=s5 I dw, 


w — 


for any z inside ø. 





Knowing that the geometric series is 





for | r | <1, we see that, for all w and z with |z — zo| < |w — zo|, we have 


1 1 1 


w—z w— zo 1— #5. 
w—zo 


a 1 S Zz — 20 S 
= W— zo W — zo 


n=0 











Restrict the numbers w to lie on the loop o. Then for those complex 
numbers z with |z — zo| < |w — zol, 
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{z such that IZ-Zol< dis(Zp,0)} 


we have 


fle) = mle 


1 
=e = {2 zeae 
mi w — zo Tua 


- LRE e 
oni wW— 2 


= E RE 2 





a convergent power series. 
Of course the above is not quite rigorous, since we did not justify the 
switching of the integral with the sum. It follows, nontrivially, from the 


nm 
fact that the series $> >o (=) converges uniformly. 


Note that we have also used the Cauchy Integral Formula, namely that 


f(z) = a [ ru. 


ni — z)?t 
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9.5 Conformal Maps 


We now want to show that analytic functions are also quite special when one 
looks at the geometry of maps from R? to R?. After defining conformal 
maps (the technical name for those maps that preserve angles), we will 
show that an analytic function will be conformal at those points where its 
derivative is nonzero. This will be seen to follow almost immediately from 
the Cauchy-Riemann equations. 
Before defining angle-preserving, we need a description for the angle 

between curves. Let 

éy2(—1; 1) R?, 
with o1 (t) = (zı (t), y1 (t)), and 

a2 :([-1,1] > R’, 


with o2(t) = (xa(t), ye(t)), be two differentiable curves in the plane which 
intersect at 


01(0) = o2(0). 
The angle between the two curves is defined to be the angle between the 
curves’ tangent vectors. 






angle between 
O71 and Oo 
y» 


Thus we are interested in the dot product between the tangent vectors of 


the curves: 
doi doz = dx 1 dy dzz dye 
aoa © (GF) (Ge) 
_ dz, dz dy; dyz 
dt dt © dt dt’ 


Definition 9.5.1 A function f(x,y) = (u(x, y), v(x, y)) will be conformal 
at a point (xo, Yo) if the angle between any two curves intersecting at (xo, yo) 
is preserved, i.e., the angle between curves cy and oz is equal to the angle 
between the image curves f (01) and f(o2). 
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Thus 





is conformal while 


f 
nn 


not contormal 
O2 


9i (G2) 


(0+) 


is not. 


Theorem 9.5.1 An analytic function f(z) whose derivative at the point 
zo is not zero will be conformal at zo. 


Proof : The tangent vectors are transformed under the map f by multi- 
plying them by the two-by-two Jacobian matrix for f. Thus we want to 
show that multiplication by the Jacobian preserves angles. Writing f in its 
real and imaginary parts, with z = x + ty, as 


f(z) = f(x,y) = ulz, y) T iu(2,y), 


the Jacobian of f at the point zo = (zo, yo) will be 


2u (z0, y0) S* (x0, yo) 
Z2 (x0, y0) 3% (xo, Yo) 


Df (xo, yo) = ( 
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But the function f is analytic at the point zo and hence the Cauchy- 
Riemann equations 


ð ð 
= (xo, Yo) = g, oo) 
ð ð 

— Fy (0040) = 5, (0; 40) 


hold, allowing us to write the Jacobian as 


Die = ( a By (Zo Yo) ) i 


To, Yo) S (T0, yo) 


Rie e |e 


Note that the columns of this matrix are orthogonal (i.e., their dot product 
is zero). This alone shows that the multiplication by the Jacobian will 
preserve angle. We can also show this by explicitly multiplying the Jacobian 
by the two tangent vectors dn and 422 e and then checking that the dot 
product between dos and doa is equal to the dot product of the image 
tangent vectors. O 

This proof uses the Cauchy-Riemann equation approach to analyticity. 
A more geometric (and unfortunately a more vague) approach is to look 
carefully at the requirement for 


lim f(zo +h) — f(zo) 
h-0 h 


to exist, no matter what path is chosen for h to approach zero. This 
condition must place strong restrictions on how the function f alters angles. 

This also suggests how to approach the converse. It can be shown 
(though we will not) that a conformal function f must satisfy either the 
limit for analyticity 


lim f(zo + 0 — f(%) 


h-0 


or that the limit holds for the conjugate function f 


at (zo + Dia Fizo) 


ma 


where the conjugate function of f(z) = u(z) + iv(z) is 


f(z) = u(z) — iv(z). 
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9.6 The Riemann Mapping Theorem 


Two domains D; and Dg are said to be conformally equivalent if there is a 
one-to-one onto conformal map 


f : Di > Dz. 


If such a function f exists, then its inverse function will also be conformal. 
Since conformal basically means that f is analytic, if two domains are 
conformally equivalent, then it is not possible to distinguish between them 
using the tools from complex analysis. Considering that analytic functions 
are special among functions, it is quite surprising that there are clean results 
for determining when two domains are conformally equivalent. The main 
result is: 


Theorem 9.6.1 (Riemann Mapping Theorem) Two simply connected 
domains, neither of which are equal to C, are conformally equivalent. 


(Recall that a domain is simply connected if any closed loop in the 
domain is homotopic to a point in the domain, or intuitively, if every closed 
loop in the domain can be continuously shrunk to a point.) Frequently this 
result is stated as: for any simply connected domain D that is not equal to 
C, there is a conformal one-to-one onto map from D to the unit disc. Thus 





is conformally equivalent to 


D 
\N 
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The Riemann Mapping Theorem, though, does not produce for us the de- 
sired function f. In practice, it is an art to find the conformal map. The 
standard approach is to first find conformal maps from each of the domains 
to the unit disc. Then, to conformally relate the two domains, we just 
compose various maps to the disc and inverses of maps to the disc. 


For example, consider the right half plane 


D — {z € C: Re(z) > 0}. 





The function 





provides our conformal map from D to the unit disc. This can be checked 
by showing that the boundary of D, the y-axis, maps to the boundary of 
the unit disc. In this case, the inverse to f is f itself. 

The Riemann Mapping Theorem is one reason why complex analysts 
spend so much time studying the function theory of the disc, as knowledge 
about the disc can be easily translated to knowledge about any simply 
connected domain. 

In several complex variables theory, all is much more difficult, in large 
part because there is no higher dimensional analogue of the Riemann Map- 
ping Theorem. There are many simply connected domains in C” that are 
not conformally equivalent. 
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9.7 Several Complex Variables: Hartog’s 


Theorem 
Let f(z1,...,2n) be a complex-valued function of n complex variables. We 
say that f is holomorphic (or analytic) in several variables if f(z1,..., Zn) 


is holomorphic in each variable z; separately. Although many of the basic 
results for one variable analytic functions can be easily carried over to the 
several variable case, the subjects are profoundly different. These differ- 
ences start with Hartog’s Theorem, which is the subject of this section. 

Consider the one-variable function f(z) = 4. This function is holomor- 
phic at all points except at the origin, where it is not even defined. It is 
thus easy to find a one-variable function that is holomorphic except for at 
one point. But what about the corresponding question for holomorphic 
functions of several variables? Is there a function f(z1,...,Zn) that is holo- 
morphic everywhere except at an isolated point? Hartog’s theorem is that 
no such function can exist. 


Theorem 9.7.1 (Hartog’s Theorem) Let U be an open connected re- 
gion in C” and let V be a compact connected set contained in U. Then any 
function f(z1,..., Zn) that is holomorphic on U — V can be extended to a 
holomorphic function that is defined on all of U. 


This certainly includes the case when V is an isolated point. Before 
sketching a proof for a special case of this theorem, consider the following 
question that is now quite natural, namely, is there a natural condition on 
open connected sets U so that there will exist holomorphic functions on U 
that cannot be extended to a larger open set. Such sets U are called domains 
of holomorphy. Hartog’s Theorem says that regions like U — (isolated point) 
are not domains of holomorphy. In fact, a clean criterion does exist and 
involves geometric conditions on the boundary of the open set U (techni- 
cally, the boundary must be pseudoconvex). Hartog’s Theorem opens up a 
whole new world of phenomena for several complex variables. 

One way of thinking about Hartog’s Theorem is in considering the func- 
tion Hinte, where both f and g are holomorphic, as a possible coun- 
terexample. If we can find a holomorphic function g that has a zero at an 
isolated point or even on a compact set, then Hartog’s Theorem will be 
false. Since Hartog’s Theorem is indeed a theorem, an analytic function in 
more than one variable cannot have a zero at an isolated point. In fact, 
the study of the zero locus g(z1,...,2n) = 0 leads to much of algebraic and 
analytic geometry. 

Now to sketch a proof of Hartog’s Theorem, subject to simplifying as- 
sumptions that U is the polydisc 


U = {(z,w) : |z| < 1,|w| < 1} 
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and that V is the isolated point (0,0). We will also use the fact that if two 
functions that are holomorphic on an open connected region U are equal 
on an open subset of U, then they are equal on all of U. (The proof of this 
fact is similar to the corresponding result in one-variable complex analysis, 
which can be shown to follow from exercise three at the end of this chapter.) 

Let f(z,w) be a function that is holomorphic on U — (0,0). We want to 
extend f to be a holomorphic function on all of U. Consider the sets z = c, 
where c is a constant with |c| <1. Then the set 


(z = c) (QU - (0,0)) 


is an open disc of radius one if c #4 0 and an open disc punctured at the 
origin if c= 0. Define a new function by setting 


F(z,w) = il E Fo) iw, 


— 2i 1 v-w 
2 

This will be our desired extension. First, the function F is defined at all 
points of U, including the origin. Since the z variable is not varying in the 
integral, we have by Cauchy’s Integral Formula that F(z, w) is holomorphic 
in the w variable. Since the original function f is holomorphic with respect 
to the z variable, we have that F is holomorphic with respect to z; thus F 
is holomorphic on all of U. But again by Cauchy’s Integral Formula, we 
have that F = f when z # 0. Since the two holomorphic functions are 
equal on an open set of U, then we have equality on U — (0,0). 

The general proof of Hartog’s Theorem is similar, namely to reduce the 
problem to slicing the region U into a bunch of discs and punctured discs 
and then using Cauchy’s Integral Formula to create the new extension. 


9.8 Books 


Since complex analysis has many applications, there are many beginning 
textbooks, each emphasizing different aspects of the subject. An excellent 
introduction is in Marsden and Hoffman’s Basic Complex Analysis [83]. 
Palka’s An Introduction to Complex Function Theory [92] is also an excellent 
text. (I first learned complex analysis from Palka.) A recent beginning book 
is Greene and Krantz’ Function Theory of One Complex Variable [49]. For 
a rapid fire introduction, Spiegels’ Complex Variables [101] is outstanding, 
containing a wealth of concrete problems. 

There are a number of graduate texts in complex analysis, which do 
start at the beginning but then build quickly. Ahlfors’ book [1] has long 
been the standard. It reflects the mathematical era in which it was written 
(the 1960s) and thus approaches the subject from a decidedly abstract point 
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of view. Conway’s Functions of One Complez Variable [21] has long been 
the prime competitor to Ahlfors for the beginning graduate student market 
and is also quite good. The recent book by Berenstein and Gay [8] provides 
a modern framework for complex analysis. A good introduction to complex 
analysis in several variables is Krantz’ Function Theory in Several Variables 
[77]. 

Complex analysis is probably the most beautiful subject in undergradu- 
ate mathematics. Neither Krantz’ Complex Analysis: The Geometric View- 
point [78] nor Davis’ The Schwarz Function and its Applications [25] are 
textbooks but both show some of the fascinating implications contained in 
complex analysis and are good places to see how how analytic functions 
can be naturally linked to other parts of mathematics. 


9.9 Exercises 


1. Letting z = x + iy, show that the function 
f(z) = f(a,y) =y? 


is not analytic. Show that it does not satisfy the Cauchy Integral Formula 


f(z0) = a oe) 


271 Je 2— 2% 





dz, 


for the case when z) = 0 and when the closed loop a is the circle of radius 
one centered at the origin. 

2. Find a function f(z) that is not analytic, besides the function given in 
problem one. If you think of f(z) as a function of the two variables 


f(z, y) = u(x, y) + iv(z,y), 
almost any choice of functions u and v will work. 
3. Let f(z) and g(z) be two analytic functions that are equal at all points 
on a closed loop ø. Show that for all points z in the interior of the closed 
loop we have the two functions equal. As a hint, start with the assumption 
that g(z) is the zero function and thus that f(z) is zero along the loop ø. 
Then show that f(z) must also be the zero function inside the loop. 
4. Find a one-to-one onto conformal map from the unit disc {(x,y) : 2? + 
y? < 1} to the first quadrant of the plane {(z,y): z > O and y > 0}. 


n 
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5. Let z1, Z2 and 23 be three distinct complex numbers. Show that we can 
find numbers a, b, c and d with ad — bc = 1 such that the map 


az+b 
cz+d 





T(z) = 


maps 2, to 0, z2 to 1 and z3 to 2. Show that the numbers a,b,c and d are 
uniquely determined, up to multiplication by —1. 


6. Find f“, r$ as follows: 
i dz 
y b+ 2?’ 


a. Find 
where y = 71 + %2 is the closed loop in the complex plane 


Yı 


-R y2 R 


consisting of the path 
yı = {Re"? : 0 < On} 


and 
y2 = {(2,0) ER? : -R< z < R}. 
b. Show that 
; dz 
lim — = 
R-00 y 1+2? 
c. Conclude with the value for f? r$. 
(This is a standard problem showing how to calculate hard real integrals 
easily. This is a hard problem if you have never used residues before; it 
should be straightforward if you have.) 
7. The goal of this problem is to construct a conformal map from the unit 
sphere (minus the north pole) to the complex numbers. Consider the sphere 
S? = {(z,y, 2): £? +y? +2? =1}. 
a. Show that the map 


r: 8? — (0,0,1) > C 


defined by 





x . Y 
ona 





n(x,y,Z) = 
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is one-to-one, onto and conformal. 

b. We can consider the complex numbers C as sitting inside R3 by 
mapping x + iy to the point (z,y,0). Show that the above map r can be 
interpreted as the map that sends a point (a, y,z) on S? — (0,0,1) to the 
point on the plane (z = 0) that is the intersection of the plane with the line 
through (x,y,z) and (0, 0,1). 


(0,0,1) 


È 


c. Justify why people regularly identify the unit sphere with CUoo. 


Chapter 10 


Countability and the 
Axiom of Choice 


Basic goal: Comparing infinite sets 


Both countability and the axiom of choice grapple with the elusive notions 
behind “infinity”. While both the integers Z and the real numbers R are 
infinite sets, we will see that the infinity of the reals is strictly larger than 
the infinity of the integers. We will then turn to the Axiom of Choice, 
which, while straightforward and not an axiom at all for finite sets, is deep 
and independent from the other axioms of mathematics when applied to 
infinite collections of sets. Further, the Axiom of Choice implies a number 
of surprising and seemingly paradoxical results. For example, we will show 
that the Axiom of Choice forces the existence of sets of real numbers that 
cannot be measured. 


10.1 Countability 


The key is that there are different orders or magnitudes of infinity. The 
first step is to find the right definition for when two sets are of the same 
size. 


Definition 10.1.1 A set A is finite of cardinality n if there is a one-to- 
one onto function from the set {1,2,3,---,n} to A. The set A is countably 
infinite if there is a one-to-one onto function from the natural numbers 
N = {1,2,3,...,} to A. A set that is either finite or countably infinite is 
said to be countable. A set A is uncountably infinite if it is not empty and 
not countable. 
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For example, the set {a, b,c} is finite with 3 elements. The more troubling 
and challenging examples appear in the infinite cases. 
For example, the positive even numbers 


QN = {2,4,6,8,---}, 


while properly contained in the natural numbers N, are of the same size as 
N and hence are countably infinite. An explicit one-to-one onto map 


f:N->2N 
is f(n) = 2-n. Usually this one-to-one correspondence is shown via: 


1 


N 


— «me AD 
«——_w eh 
C2 +——_ + ED 
fp. o——_& + © 
u e—a 
D e — > 


N 


The set of whole numbers {0, 1, 2,3,...} is also countably infinite, as seen 
by the one-to-one onto map 


f:N => {0,1,2,3,..} 


given by 
f(n)=n-1. 


Here the picture is 


(>) 
NS 
ts 
A 
on 


- s —r 
N © ——wee m 
C2 «———_pe 
D ee 
U e — e 
aq +———_»- 


The integers Z are also countably infinite. The picture is 


HT 
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while an explicit: one-to-one onto function 





f:N>Z 
is, for even n, 
n 
f(n)= 5 
and, for odd n, 
n—-1 


It is typical for the picture to be more convincing than the actual function. 
The rationals 


Q= {7 pg Za #0} 


are also countably infinite. The picture for showing that the positive ratio- 
nals are countably infinite is as follows: 





Every positive rational appears in the above array and will eventually be 
hit by a natural number. 
In fact 


Theorem 10.1.1 Let A and B be two countably infinite sets. Then the 
Cartesian product A x B is also countably infinite. 
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Proof: Since both A and B are in one-to-one correspondence with the 


natural numbers N, all we need show is that the product N x N is countably 
infinite. For N x N = {(n,m) : n,m € N}, the correct diagram is: 


(1,5) (1,6) 





More algebraically, but less clearly, an explicit, one-to-one onto map 


f:NxNON 


f(m,n) = Simeonen Ami 


Note that the fact that N x N is the same size as N is of course in 
marked contrast to the finite case. To make this painfully obvious, consider 
A = {a,b,c}, a set with three elements. Then A x A is the nine element 
set {(a, a), (a, b), (a, c), (b, a), (b, b), (b, c), (c, a), (c, b), (c, c) }- 

There are infinite sets which, in some sense, are of size strictly larger 
than the natural numbers. Far from being esoteric, the basic example is 
the set of real numbers; the reals, while certainly not finite, are also not 
countably infinite. 

We will give the famed Cantor diagonalization argument showing that 
the real numbers [0, 1] = {c € R: 0 < x < 1} cannot be countable. 


a 


Theorem 10.1.2 The interval [0,1] is not countable. 


Proof : The proof is by contradiction. We assume that there is a one-to- 
one onto map f : N —> [0,1] and then find a real number in [0,1] that is 
not in the image, contradicting the assumption that f is onto. We will use 
that every real number in [0, 1] can be expressed as a decimal expansion 


0.41 22%3%4..., 
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where each x, is 0,1,2,3,... or 9. To make this expansion unique, we will 
always round up, except for the case 0.99999 ... which we leave as is. Thus 
0.32999... will always be written as 0.3300. 

Now let us take our assumed one-to-one correspondence f : N —> [0,1] 
and start writing down its terms. Let 


fl) = .ayaga3---, 
f(2) =  .bybob3---, 
JB) =  .cyeacg---, 
f(4) = .didgd3---, 
f(5) = .e,@2€3°--, 


and so forth. Note that the a;,b;, etc. are now fixed numbers between 0 
and 9, given to us by the assumed one-to-one correspondence. They are 
not variables. 

We will construct a new real number .N; NoN3N4... which will never 
appear in the above list, forcing a contradiction to the assumption that f 


is onto. Set 
Nn, = 4 if the ktřentry of f(k) £4 
k= \ 5, if the k**entry of f(k) =4 


(The choice of the numbers 4 and 5 are not important; any two integers 
between 0 and 9 would do just as well.) 
Note that Nj, is 4 if a, 4 4 and is 5 if a, = 4. Thus, no matter what, 


Ni Noa N3 sii # ALA... 5 f(1). 
Likewise No is 4 if bo 4 4 and is 5 if b2 = 4 and hence 
Ni Na Ng eSa £ .bab2b3 oe = J2). 


This continues. Since our decimal expansions are unique, and since each 
Ny, is defined so that it is not equal to the kt? term in f(k), we must have 
that .Nı Nə N; --- is not equal to any f(k), meaning that f cannot be onto. 
Thus there can never be an onto function from the natural numbers to 
the interval [0,1]. Since the reals are certainly not finite, they must be 
uncountably infinite. 


10.2 Naive Set Theory and Paradoxes 


The question of what is a mathematical object was a deep source of debate 
in the last part of the eighteenth and first part of the nineteenth century. 
There has only been at best a partial resolution, caused in part by Gödel’s 
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work in logic and in part by exhaustion. Does a mathematical object exist 
only if an algorithm can be written that will explicitly construct the object 
or does it exist if the assumption of its existence leads to no contradictions, 
even if we can never find an example? The tension between constructive 
proofs versus existence proofs has in the last thirty years been eased with 
the development of complexity theory. The constructive camp was led by 
Kronecker (1823-1891), Brouwer (1881-1966) and Bishop (1928-1983). The 
existential camp, led by Hilbert (1862-1943), won the war, leading to most 
mathematicians’ belief that all of mathematics can be built out of a correct, 
set-theoretic foundation, usually believed to be an axiomatic system called 
Zermelo-Fraenkel plus the Axiom of Choice (for a list of those axioms, see 
Paul Cohen’s Set Theory and the Continuum Hypothesis [20] Chapter II, 
Sections 1 and 2). This is in spite of the fact that few working mathemati- 
cians can actually write down these axioms, which certainly suggests that 
our confidence in our work does not stem from the axioms. More accurately, 
the axioms were chosen and developed to yield the results we already know 
to be true. In this section we informally discuss set theory and then give 
the famed Zermelo-Russell paradox, which shows that true care must be 
exercised in understanding sets. 

The naive idea of a set is pretty good. Here a set is some collection of 
objects sharing some property. For example 


{n : nis an even number} 


is a perfectly reasonable set. Basic operations are union, intersection and 
complement. We will see now how to build integers out of sets. 

First for one subtlety. Given a set A, we can always form a new set, 
denoted by {A}, which consists of just one element, namely the set A. If 
A is the set of all even integers and thus containing an infinite number of 
elements, the set {A} has only one element. Given a set A, we define the 
successor set At as the union of the set A with the set {A}. Thus x € At 
if either  € A or = {A}. 

We start with the empty set 0, the set that contains no elements. This 
set will correspond to the integer 0. Then we label the successor to the 
empty set by 1: 

1=0* = {9}, 


the successor to the successor of the empty set by 2: 
2 = (0+)+ = (0, {O}}, 


and in general the successor to the set n by n+ 1. 
By thinking of the successor as adding by one, we can recover by recur- 
sion addition and thus in turn multiplication, subtraction and division. 
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Unfortunately, just naively proceeding along in this fashion will lead 
to paradoxes. We will construct here what appears to be a set but which 
cannot exist. First, note that sometimes a set can be a member of itself 
and sometimes not (at least if we are working in naive set theory; much of 
the mechanics of Zermelo-Fraenkel set theory is to prevent such nonchalant 
assumptions about sets). For example, the set of even numbers is not itself 
an even number and hence is not an element of itself. On the other hand, 
the set of all elements that are themselves sets with more than two elements 
is a member of itself. We can now define our paradoxical set. Set 


X 


{A: Ais a set that does not contain itself} 
{A:A¢ A}. 


Is the set X an element of itself? If X € X, then by the definition of X, 
we must have X ¢ X, which is absurd. But if X ¢ X, then X € X, which 
is also silly. There are problems with allowing X to be a set. This is the 
Zermelo-Russell paradox 

Do not think this is just a trivial little problem. Russell (1872-1970) 
reports in his autobiography that when he first thought of this problem 
he was confident it could easily be resolved, probably that night after din- 
ner. He spent the next year struggling with it and had to change his whole 
method of attack on the foundations of mathematics. (Russell, with White- 
head 1861-1947), did not use set theory but instead developed type theory; 
type theory is abstractly no better or worse than set theory, but mathe- 
maticians base their work on the language of set theory, probably by the 
historical accident of World War II, which led US mathematicians to be 
taught by German refugees, who knew set theory, as Zermelo (1871-1953 
was German.) 

Do not worry too much about the definitions of set theory. You should 
be nervous, though, if your sets refer to themselves, as this is precisely what 
led to the above difficulty. 


10.3. The Axiom of Choice 


The axioms in set theory were chosen and developed to yield the results we 
already know to be true. Still, we want these axioms to be immediately ob- 
vious. Overall, this is the case. Few of the actual axioms are controversial, 
save for the Axiom of Choice, which states: 


Axiom 10.3.1 (Axiom of Choice) Let {Xq} be a family of nonempty 
sets. Then there is a set X which contains, from each set Xq, exactly one 
element. 
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For a finite collection of sets, this is obvious and not at all axiomatic 
(meaning that it can be proven from other axioms). For example, let X, = 
{a,b} and X_ = {c,d}. Then there is certainly a set X containing one 
element from X, and one element from X92; for example, just let X = {a,c}. 

The difficulties start to arise when applying the axiom to an infinite 
(possibly uncountably infinite) number of sets. The Axiom of Choice gives 
no method for finding the set X; it just mandates the existence of X. This 
leads to the observation that if the Axiom of Choice is needed to prove the 
existence of some object, then you will never be able to actually construct 
that object. In other words, there will be no method to actually construct 
the object; it will merely be known to exist. 

Another difficulty lies not in the truth of the axiom of choice but in the 
need to assume it as an axiom. Axioms should be clear and obvious. No 
one would have any difficulty with its statement if it could be proven to 
follow from the other axioms. 

In 1939, Kurt Gödel showed that the Axiom of Choice is consistent with 
the other axioms. This means that using the Axiom of Choice will lead to 
no contradictions that were not, in some sense, already present in the other 
axioms. But in the early 1960s, Paul Cohen [20] showed that the Axiom 
of Choice was independent of the other axioms, meaning that it cannot be 
derived from the other axioms and hence was truly an axiom. In particular, 
one can assume that the Axiom of Choice is false and still be confident that 
no contradictions will arise. 

A third difficulty with the Axiom of Choice is that it is equivalent to any 
number of other statements, some of which are quite bizarre. To see some 
of the many equivalences to the Axiom of Choice, see Howard and Rubin’s 
Consequences of the Axiom of Choice [62]. One of these equivalences is the 
subject of the next section. 


10.4 Non-measurable Sets 


Warning: This section will assume a working knowledge of Lebesgue mea- 
sure on the real numbers. In particular, we will need that 


e Ifa set A is measurable, its measure m(A) is equal to its outer 
measure m* (A). 


e If A,, Ao,... are disjoint sets that are measurable, then the union is 
measurable, with 
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This last condition corresponds to the idea that if we have two sets with 
lengths a and 6, say, then the length of the two sets placed next to each 
other should be a+ b. Also, this example closely follows the example of a 
nonmeasurable set in Royden’s Real Analysis [95]. 

We will find a sequence of disjoint sets A1, Ao,..., all of which have the 
same outer measure and hence, if measurable, the same measure, whose 
union is the unit interval [0,1]. Since the Lebesgue measure of the unit 
interval is just its length, we will have 


If each A; is measurable, since the measures are equal, this would mean 
that we can add a number to itself infinitely many times and have it sum 
to one. This is absurd. If a series converges, then the individual terms in 
the series must converge to zero. Certainly they cannot all be equal. 

The point of this section is that to find these sets A;, we will need to use 
the Axiom of Choice. This means that we are being fairly loose with the 
term “find”, as these sets will in no sense actually be constructed. Instead, 
the Axiom of Choice will allow us to claim their existence, without actually 
finding them. 

We say that z and y in € [0,1] are equivalent, denoted by x = y, if x—y 
is a rational number. It can be checked that this is an equivalence relation 
(see Appendix A for the basic properties of equivalence relations) and thus 
splits the unit interval into disjoint equivalency classes. 

We now apply the Axiom of Choice to these disjoint sets. Let A be the 
set containing exactly one element from each of these equivalency classes. 
Thus the difference between any two elements of A cannot be a rational 
number. Note again, we do not have an explicit description of A. We have 
no way of knowing if a given real number is in A, but, by the Axiom of 
Choice, the set A does exist. In a moment we will see that A cannot be 
measurable. 

We will now find a countable collection of disjoint sets, each with the 
same outer measure as the outer measure of the set A, whose union will be 
the unit interval. Now, since the rational numbers in [0,1] are countable, 
we can list all rational numbers between zero and one as r9,11,72,.... For 
convenience, assume that ro = 0. For each rational number r;, set 


A; = A+r;(mod1). 
Thus the elements of A; are of the form 


a+r; — greatest integer part of (a + rj). 
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In particular, A = Ag. It is also the case that for all 7 
m*(A) = m*(Aj), 


which is not hard to show, but is mildly subtle since we are not just shifting 
the set A by the number r; but are then modding out by one. 

We now want to show that the A; are disjoint and cover the unit interval. 
First, assume that there is a number yv in the intersection of A; and Aj. 
Then there are numbers a; and a; in the set A such that 


z = a; + ri (mod 1) = a; + r; (mod 1). 


Then a;—a,; is a rational number, meaning that a; = a;, which forces 7 = j. 
Thus if i Æ j, then 
Ai N A; = Å. 


Now let x be any element in the unit interval. It must be equivalent to 
some element a in A. Thus there is a rational number r; in the unit interval 
with either 

zr=a+r;ora =z +ri. 


In either case we have x € A;. Thus the A; are indeed a countable collection 
of disjoint sets that cover the unit interval. But then we have the length of 
the unit interval as an infinite series of the same number: 


l= > m(Ai) = mA), 


which is impossible. Thus the set A cannot be measurable. 


10.5 Gödel and Independence Proofs 


In the debates about the nature of mathematical objects, all agreed that 
correct mathematics must be consistent (i.e., it should not be possible to 
both prove a statement and its converse). Eventually it was realized that 
most people were also implicitly assuming that mathematics was complete 
(meaning that any mathematical statement must ultimately be capable 
of being either proven or disproven). David Hilbert wanted to translate 
both of these goals into precise mathematical statements, each capable of 
rigorous proof. This attempt became known as Formalism. Unfortunately 
for Hilbert’s school, K. Gödel (1906-1977) in 1931 destroyed any of these 
hopes. Gödel showed: 


Any aziomatic system strong enough to include basic arithmetic must have 
statements in it that can be neither proven nor disproven, within the system. 


10.6. BOOKS 211 


Further, the example Godel gave of a statement that could be neither proven 
nor disproven was that the given axiomatic system was itself consistent. 


Thus in one fell swoop, Gödel showed that both consistency and com- 
pleteness were beyond our grasp. Of course, no one seriously thinks that 
modern mathematics has within it a hidden contradiction. There are state- 
ments, though, that people care about that are not capable of being proven 
or disproven within Zermelo-Fraenkel set theory. The Axiom of Choice is 
an example of this. Such statements are said to be independent of the other 
axioms of mathematics. On the other hand, most open questions in math- 
ematics are unlikely to be independent of Zermelo-Fraenkel set theory plus 
the Axiom of Choice. One exception is the question of P=NP (discussed in 
Chapter Sixteen), which many are now believing to be independent of the 
rest of mathematics. 


10.6 Books 


For many years the best source for getting an introduction to set theory has 
been Halmos’ Naive Set Theory [53], which he wrote, in large part, to teach 
himself the subject. A more recent text is Moschovakis’ Notes on Set Theory 
[87]... An introduction, not to set theory, but to logic is Incompleteness 
Phenomenon by Goldstern and Judah [46]. A slightly more advanced text, 
by a tremendous expositor, is Smullyan’s Gédel’s Incompleteness Theorems 
[100]. A concise, high level text is Cohen’s Set Theory and the Continuum 
Hypothesis [20]. 

A long time popular introduction to Gédel’s work has been Nagel and 
Newman’s Gédel’s Proof [89]. This is one of the inspirations for the amazing 
book of Hofstadter, Gédel, Escher and Bach [61]. Though not precisely a 
math book, it is full of ideas and should be read by everyone. Another 
impressive recent work is Hintikka’s Principles of Mathematics, Revisited 
[60]. Here a new scheme for logic is presented. It also contains a summary 
of Hintikka’s game-theoretic interpretation of Gédel’s work. 


10.7 Exercises 
1. Show that the set 


{az? + bz +c:a,b,c € Q} 


of all one variable polynomials of degree two with rational coefficients is 
countable. 
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2. Show that the set of all one variable polynomials with rational coefficients 
is countable. 
3. Show that the set 


{ao + ayx + agz” +... : a9, a1, 42,-.. € Q} 


of all formal power series in one variable with rational coefficients is not 
countable. 
4. Show that the set of all infinite sequences consisting of zeros and twos 
is uncountable. (This set will be used to show that the Cantor set, which 
will be defined in Chapter Twelve, is uncountable.) 
5. In section two, the whole numbers were defined as sets. Addition by one 
was defined. Give a definition for addition by two and then a definition in 
general for whole numbers. Using this definition, show that 2+ 3 = 3 + 2. 
6.(Hard) A set S is partially ordered if there is an operation < such that 
given any two elements x and y, we have x < y, y < x, £ = y or x and 
y have no relationship. The partial ordering is a total ordering if it must 
be the case that given any two elements x and y, it must be the case that 
z<y,y<xorx=y. For example, if S is the real numbers, the standard 
interpretation of < as less than places a total ordering on the reals. On the 
other hand, if S is the set of all subsets of some other set, then a partial 
ordering would exist if we let < denote set containment. This is not a total 
ordering since given any two subsets, it is certainly not the case that one 
must be contained in the other. A partially ordered set is called a poset. 

Let S be a poset. A chain in S is a subset of S on which the partial 
ordering becomes a total ordering. Zorn’s Lemma states that if S is a poset 
such that every chain has an upper bound, then S$ contains a maximal 
element. Note that the upper bound to a chain need not be in the chain 
and that the maximal element need not be unique. 

a. Show that the Axiom of Choice implies Zorn’s Lemma. 

b. Show that Zorn’s Lemma implies the Axiom of Choice (this is quite 
a bit harder). 
7. (Hard) The Hausdorff Maximal Principle states that every poset has a 
maximal chain, meaning a chain that is not strictly contained in any other 
chain. Show that the Hausdorff Maximal Principle is equivalent to the 
Axiom of Choice. 
8. (Hard) Show that the Axiom of Choice (via the Hausdorff Maximal 
Principle) implies that every field is contained in an algebraically closed 
field. (For the definitions, see Chapter Eleven.) 


Chapter 11 


Algebra 


Basic Objects: Groups and rings 
Basic Maps: Group and ring homomorphisms 


While current abstract algebra does indeed deserve the adjective abstract, 
it has both concrete historical roots and modern day applications. Central 
to undergraduate abstract algebra is the notion of a group, which is the 
algebraic interpretation of the geometric idea of symmetry. We can see 
something of the richness of groups in that there are three distinct areas 
that gave birth to the correct notion of an abstract group: attempts to 
find (more accurately, attempts to prove the inability to find) roots of 
polynomials, the study by chemists of the symmetries of crystals, and the 
application of symmetry principles to solve differential equations. 

The inability to generalize the quadratic equation to polynomials of 
degree greater than or equal to five is at the heart of Galois Theory and 
involves the understanding of the symmetries of the roots of a polynomial. 
Symmetries of crystals involve properties of rotations in space. The use 
of group theory to understand the symmetries underlying a differential 
equation leads to Lie Theory. In all of these the idea and the applications 
of a group are critical. 


11.1 Groups 
This section presents the basic definitions and ideas of group theory. 


Definition 11.1.1 A nonempty set G that has a binary operation 


GxG-oG, 
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denoted for all elements a and b in G by a-b, is a group if: 
i) There is an element e € G such thate-a=a-e =a, for alla inG. 
(The element e is of course called the identity.) 
ii) For any a E G, there is an element denoted by a~! such that aa~t = 
a`la =e. (Naturally enough, a~' is called the inverse of a.) 
iii) For all a,b,c E€ G, we have (a-b)-c=a-(b-c) (i.e., we must have 
associativity). 


1 


Note that commutativity is not required. 

Now for some examples. Let GL(n,R) denote the set of all n x n 
invertible matrices with real coefficients. Under matrix multiplication, we 
claim that GL(n,R) is a group. The identity element of course is simply 


the identity matrix 
1 = 0 


0O 1 


The inverse of an element will be its matrix inverse. The check that matrix 
multiplication is associative is a long calculation. The final thing to check 
is to see that if A and B are invertible n x n matrices, then their product, 
A-B, must be invertible. From the key theorem of linear algebra, a matrix is 
invertible if and only if its determinant is nonzero. Using that det(A - B) = 
det(A) det(B), we have 


det(A - B) = det(A) - det(B) 40. 


Thus GL(n,R) is a group. 
Note that for almost any choice of two matrices 


A-BHAB-A. 


The group is not commutative. Geometrically, we can interpret the ele- 
ments of GL(n,R) as linear maps on R”. In particular, consider rotations 
in three-space. These do not commute (showing this is an exercise at the 
end of this chapter). Rotations can be represented as invertible 3 x 3 matri- 
ces and hence as elements in GL(3,R). If we want groups to be an algebraic 
method for capturing symmetry, then we will want rotations in space to 
form a group. Hence we cannot require groups to be commutative. (Note 
that rotations are associative, which is why we do require groups to be 
associative.) 

The key examples of finite groups are the permutation groups. The 
permutation group, Sz, is the set of all permutations on n distinct elements. 
The binary operation is composition while the identity element is the trivial 
permutation that permutes nothing. 
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To practice with the usual notation, let us look at the group of permu- 
tations on three elements: 


Ss = {e, (12), (13), (23), (123), (132)}. 


Of course we need to explain the notation. Fix an ordered triple (a1, a2, a3) 
of numbers. Here order matters. Thus (cow, horse, dog) is different from 
the triple (dog, horse, cow). Each element of Sg will permute the ordering 
of the ordered triple. Specifically, the element (12) permutes (a1, a2, a3) to 
(a2, a, az): 


12 
(a1, a2,03) rad (a2, Qi, az). 


For example, the element (12) will permute (cow, horse, dog) to the triple 
(horse, cow, dog). The other elements of the group S3 act as follows: (13) 
permutes (a1,a2,a3) to (a3,a2,a1) : 


13 
(a1, a2, a3) = (a3,a2,a1), 
(23) permutes (a1, a2, a3) to (a1, @3, @2): 
23 
(a1, 42,03) F? (a1,a3,a2), 
(123) permutes (a1, a2, a3) to (a3, a1, @2): 
123 
(a1, a2, az) va (a3, a1, a2), 
(132) permutes (a1, a2, a3) to (a2, a3, a1): 
132 
(a1, a2, a3) Ce) (a2, a3, 01), 
and of course the identity element e leave the triple (a), a2, a3) alone: 
(a1, a2, a3) S (a1, a2, a3). 


By composition we can multiply the permutations together, to get the 
following multiplication table for S3: 
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Note that S3 is not commutative. In fact, S3 is the smallest possible non- 
commutative group. In honor of one of the founders of group theory, Niels 
Abel, we have: 


Definition 11.1.2 A group that is commutative is abelian. 


The integers Z under addition form an abelian group. Most groups are 
not abelian. 

We want to understand all groups. Of course, this is not actually doable. 
Hopefully we can at least build up groups from possibly simpler, more basic 
groups. To start this process, we make the following definition: 


Definition 11.1.3 A nonempty subset H of G is a subgroup if H is itself 
a group, using the binary operation of G. 


For example, let 


H = a21 a22 0 
0 0 1 


a2) 


ai, @2 0 i a 
( = ) € GL(2,R)) 
a22 


Then H is a subgroup of the group GL(3,R) of invertible 3 x 3 matrices. 
Definition 11.1.4 Let G and G be two groups. Then a function 


o:G>G 
is a group homomorphism #f for all 91, 92 € G, 
a(91 * 92) = o(91) + o(g2). 
For example, let A € GL(n,R). Define o : GL(n,R) — GL(n,R) by 
o(B) = ABA. 
Then for any two matrices B,C € GL(n,R)), we have 
a(BC) = A“'BCA 
=A !BAA'CA 
= 0(B)-o(C). 


There is a close relationship between group homomorphisms and a spe- 
cial class of subgroup. Before we can exhibit this, we need: 


Definition 11.1.5 Let H be a subgroup of G. The (left) cosets of G are 
all sets of the form 
gH = {gh: h € H}, 


forgeG. 
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This defines an equivalence class on G, with 
g~ 9 

if the set gH is equal to the set gH, i.e., if there is an h € H with gh = ĝ. 
In a natural way, the right cosets are the sets 

Hg = {hg:h € H}, 
which also define an equivalence relation on the group G. 
Definition 11.1.6 A subgroup H is normal if for all g in G, gHg™? = H. 


Theorem 11.1.1 Let H be a subgroup of G. The set of cosets gH, under 
the binary operation 


gH -9H = gĝH, 


will form a group if and only if H is a normal subgroup. (This group is 
denoted by G/H and pronounced G mod H.) 


Sketch of Proof: Most of the steps are routine. The main technical 
difficulty lies in showing that the binary operation 


(gH) - (GH) = (99H) 


is well defined. Hence we must show that the set gH - gH, which consists 
of the products of all elements of the set gH with all elements of the set 
9H, is equal to the set ggH. Since H is normal, we have 


ôH ($) = H. 


Then as sets 
gH = Hg. 
Thus 
gHGH = 99H - H = gGH, 

since H - H = H, as H is a subgroup. The map is well defined. 

The identity element of G/H is e- H. The inverse to gH is g™1H. 
Associativity follows from the associativity of the group G. O 

Note that in writing gH -gH = ggH, one must keep in mind that H 
is representing every element in H and thus that H is itself not a single 
element. 

As an application of this new group G/H, we now define the cyclic 
groups Z/nZ. Here our initial group is the integers Z and our subgroup 
consists of all the multiples of some fixed integer n: 


nZ = {nk:k € Z}. 
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Since the integers form an abelian group, every subgroup, including nZ, is 
normal and thus Z/nZ will form a group, It is common to represent each 
coset in Z/nZ by an integer between 0 and n — 1: 


Z/nZ = {0,1,2,...,2—1}. 


For example, if we let n = 6, we have Z/6Z = {0,1,2,3,4,5}. The addition 
table is then 





An enjoyable exercise is proving the following critical theorem relating 
normal subgroups and group homomorphisms. 


Theorem 11.1.2 Leto: G— G be a group homomorphism. If 
ker(c) = {g E€ G : o(g) = ê, the identity of Ĝ}, 


then ker(a) is a normal subgroup of G. (This subgroup ker(c) is called the 
kernel of the map o.) 


The study of groups is to a large extent the study of normal subgroups. 
By the above, this is equivalent to the study of group homomorphisms and 
is an example of the mid-twentieth century tack of studying an object by 
studying its homomorphisms. 

The key theorem in finite group theory, Sylow’s Theorem, links the 
existence of subgroups from the knowledge of the number of elements in a 
group. 


Definition 11.1.7 The order of a group G, denoted by | G |, is equal to 
the number of elements in G. 


For example, | S3 |= 6. 


Theorem 11.1.3 (Sylow’s Theorem) Let G be a finite group. 
a) Let p be a prime number. Suppose that p% divides | G |. Then G has 
a subgroup of order p%. 
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b) If p” divides | G | but pt! does not, then for any two subgroups H 
and H of order p", there is an element g € G with gHg— = Ê. 

c) If p™ divides | G| but p"+! does not, then the number of subgroups 
of order p” is 1+kp, for some k a positive integer. 


Proofs can be found in Herstein’s Topics in Algebra [57], Section 2.12. 
The importance lies in that we gather quite a bit of information about 
a finite group from merely knowing how many elements it has. 


11.2 Representation Theory 


Certainly one of the basic examples of groups is that of invertible n x 
n matrices. Representation theory studies how any given abstract group 
can be realized as a group of matrices. Since n x n matrices, via matrix 
multiplication on column vectors, are linear transformations from a vector 
space to itself, we can rephrase representation theory as the study of how 
a group can be realized as a group of linear transformations. 

If V is a vector space, let GL(V) denote the group of linear transfor- 
mations from V to itself. 


Definition 11.2.1 A representation of a group G on a vector space V is 
a group homomorphism 
p:G—>GL(V). 


We say that p is a representation of G. 


For example, consider the group S3 of permutations on three elements. 
There is quite a natural representation of S3 on three space R°. Let 


ay Qo(1) 
plo) | az | =| as) 
a3 Qo (3) 


For example, if ø = (12), then 


ai a2 
p(12) ag = at 
a3 a3 
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As a matrix, we have: 


010 
p(12)=(1 0 0 
001 


If o = (123), then since (123) permutes (a1, a@2,a3) to (a3,@1,@2), we 
have 


ay a3 
p123) | a |=| % 
a3 a2 
As a matrix, 
0 0 1 
p(123)= {1 0 0 
0 1 0 


The explicit matrices representing the other elements of S3 are left as an 
exercise at the end of the chapter. 

The goal of representation theory is to find all possible representations 
for a given group. In order to even be able to start to make sense out of 
this question, we first see how to build new representations out of old. 


Definition 11.2.2 Let G be a group. Suppose we have representations of 
G: 
pi: G> GL(Vı) 


and 
p2: G => GL(V2), 


where Vı and V> are possibly different vector spaces. Then the direct sum 
representation of G on Vi ® V2, denoted by 


(p1 @ po): G + GL(Vi) 6 GL(V2), 
is defined for allg € G by: 
(pr ® p2)(g) = pi (9) ® p2(9). 


Note that when we write out p1(g) ® p2(g) as a matrix, it will be in block 
diagonal form. 

If we want to classify representations, we should concentrate on finding 
those representations that are not direct sums of other representations. 
This leads to: 
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Definition 11.2.3 A representation p of a group G on a nonzero vector 
space V is irreducible if there is no proper subspace W of V such that for 
allg EG and allw € W, 

p(g)w € W. 


In particular if a representation is the direct sum of two other representa- 
tions, it will certainly not be irreducible. Tremendous progress has been 
made in finding all irreducible representations for many specific groups. 

Representation theory occurs throughout nature. Any time you have 
a change of coordinate systems, suddenly representations appear. In fact, 
most theoretical physicists will even define an elementary particle (such 
as an electron) as an irreducible representation of some group (a group 
that captures the intrinsic symmetries of the world). For more on this, 
see Sternberg’s Group Theory and Physics [106], especially the last part of 
Chapter 3.9. 


11.3 Rings 


- If groups are roughly viewed as sets for which there is an addition, then 
rings are sets for which there is both an addition and a multiplication. 


Definition 11.3.1 A nonempty set R is a ring if there are two binary 
operations, denoted by - and +, on R such that 

a) R with + forms an abelian group. The identity is denoted by 0. 

b) (Associativity) for alla,b,c € R, a-(b-c) = (a-b)-c. 

c) (Distributivity) for all a,b,c € R, 


a-(b+c)=a-bt+a-c 


and 
(a+ b)-c=a-ct+b-e. 


Note that rings are not required to be commutative for the - operation or, 
in other words, we do not require a -b = b-a. 

If there exists an element 1 € R with 1-a =a:1 =a forall a € R, 
we say that R is a ring with unit element. Almost all rings that are ever 
encountered in life will have a unit element. 

The integers Z = {...,—3, —2, —1,0, 1,2,3,...}, with the usual addition 
and multiplication, form a ring. Polynomials in one variable x with complex 
coefficients, denoted by C[z], form a ring with the usual addition and mul- 
tiplication of polynomials. In fact, polynomials in n variables {71,...,2n} 
with complex coefficients, denoted by C[a1,..., mn], will also form a ring 
in the natural way. By the way, the study of the ring theoretic properties 
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of C[z1,...,2n] is at the heart of much of algebraic geometry. While poly- 
nomials with complex coefficients are the most common to study, it is of 
course the case that polynomials with integer coefficients (Z[a1,...,2n]), 
polynomials with rational coefficients (Q[x1,...,@n]) and polynomials with 
real coefficients (R[z1,...,2n]) are also rings. In fact, if R is any ring, then 
the polynomials with coefficients in R form a ring, denoted by R[x1,..., 2p]. 


Definition 11.3.2 A function o : R —> Ê between rings R and Ê is a ring 
homomorphism if for all a,b € R, 


o(a + b) = g(a) + a(b) 


and 


o(a b) = o(a) - a(b). 


Definition 11.3.3 A subset I of a ring R is an ideal if I is a subgroup of 
R under + and if, for anya E€ R,al C I and Ia CI. 


The notion of an ideal in ring theory corresponds to the notion of a 
normal subgroup in group theory. This analogy is shown in the following 
theorems: 


Theorem 11.3.1 Leto: R — Ê bea ring homomorphism. Then the set 
ker(o) = {a € R: o(a) = 0} 
is an ideal in R. (This ideal ker(c) is called the kernel of the map a.) 
Sketch of Proof: We need to use that for all z € Ê, 
x:-0=0-z=0, 


which is an exercise at the end of the chapter. Let b € ker(a). Thus 
o(b) = 0. Given any element a € R, we want a-b € ker(c) and 6-a € 
ker(a). We have 


o(a-b) = a(a)-o(b) 


I 
a 
= 
ò 


implying that a -b € ker(ø). 
By a similar argument, b- a € ker(o), showing that ker(o) is indeed an 
ideal. O 


Theorem 11.3.2 Let I be an ideal in R. The sets {a+I:a € R} form 
a ring, denoted R/I, under the operations (a+JI)+(b+/) =(a+64+/ 


and (a+ J)-(6+ JI) =(a-b+1). 
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The proof is left as a (long) exercise at the end of the chapter. 

The study of a ring comes down to studying its ideals, or equivalently, its 
homomorphisms. Again, it’s a mid-twentieth century approach to translate 
the study of rings to the study of maps between rings. 


11.4 Fields and Galois Theory 


We are now ready to enter the heart of classical algebra. To a large extent, 
the whole point of high school algebra is to find roots of linear and quadratic 
polynomials. With more complicated, but in spirit, similar techniques, the 
roots for third and fourth degree polynomials can also be found. One of 
the main historical motivations for developing the machinery of group and 
ring theory was in showing that there can be no similar techniques for 
finding the roots of polynomials of fifth degree or higher. More specifically 
the roots of a fifth degree or higher polynomial cannot be obtained by a 
formula involving radicals of the coefficients of the polynomial. (For an 
historical account, see Edwards’ Galois Theory [31].) 

The key is to establish a correspondence between one variable polynomi- 
als and finite groups. This is the essence of Galois Theory, which explicitly 
connects the ability to express roots as radicals of coefficients (in analogue 
to the quadratic equation) with properties of the associated group. 

Before describing this correspondence, we need to discuss fields and field 
extensions. 


Definition 11.4.1 A ring R is a field if 

1. R has a multiplicative unit 1, 

2. for alla,b € R we have a-b=b-a and 

3. for anya £0 in R, there is an element denoted by a~! with a-a7! = 1. 


For example, since the integers Z do not have multiplicative inverses, Z 
is not a field. The rationals Q, the reals R and the complexes C are fields. 
For the ring C[z] of one variable polynomials, there corresponds the field 


Cl) = {At : Pa), Q(2) € C[z], Q(x) #0}. 


Definition 11.4.2 A field k is a field extension of a field k if k is contained 
in k. 


For example, the complex numbers C is a field extension of the real numbers 
R. 

Once we have the notion of a field, we can form the ring k[z] of one 
variable polynomials with coefficients in k. Basic, but deep, is: 
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Theorem 11.4.1 Let k be a field. Then there is a field extension k of k 
such that every polynomial in k[x] has a root in k. 


Such a field & is said to be algebraically closed. For a proof, see Garling’s A 
Course in Galois Theory [45], Section 8.2. As a word of warning, the proof 
uses the Axiom of Choice. 

Before showing how groups are related to finding roots of polynomials, 


recall that the root of a linear equation az + b = 0 is simply z = —2, For 
second degree equations, the roots of ax? + bx + c= 0 are of course 
—b+ Vb? — 4ac 
mn S 


Already interesting things are happening. Note that even if the three co- 
efficients a,b and c are real numbers, the roots will be complex if the dis- 
criminant b? — 4ac < 0. Furthermore, even if the coefficients are rational 
numbers, the roots need not be rational, as Vb? — 4ac need not be rational. 

Both of these observations lead naturally to extension fields of the field 
of coefficients. We will restrict to the case when the coefficients of our 
(monic) polynomial are rational numbers. 

Let 

P(x) = £” + an2”! +... + a0, 


with each a, € Q. By the Fundamental Theorem of Algebra (which states 
that the algebraic closure of the real numbers is the complex numbers), 
there are complex numbers ay,...,Q@, with 


P(x) = (a — a)(@% — az) + (£ — an). 


Of course, the whole problem is that the fundamental theorem does not 
tell us what the roots are. We would like an analogue of the quadratic 
equation for any degree polynomial. As mentioned before, such analogues 
do exist for cubic and quartic polynomials, but the punchline of Galois 
Theory is that no such analogue exists for degree five or higher polynomials. 
The proof of such a statement involves far more than the tools of high school 
algebra. 

Here is a rapid fire summary of Galois Theory. We will associate to 
each one variable polynomial with rational coefficients a unique finite di- 
mensional vector space over the rational numbers that is also a field exten- 
sion of the rational numbers contained in the complex numbers. Namely, 
if ay,...,Qn are the roots of the polynomial P(x), the smallest field in the 
complex numbers that contains both the rationals and the roots aj,...,Qn 
is the desired vector space. We then look at all linear transformations from 
this vector space to itself, with the strong restriction that the linear trans- 
formation is also a field automorphism mapping each rational number to 
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itself. This is such a strong restriction that there are only a finite number 
of such transformations, forming a finite group. Further, each such linear 
transformation will not only map each root of P(x) to another root but 
is actually determined by how it maps the roots to each other. Thus the 
finite group of these special linear transformations are a subgroup of the 
permutation group on n letters. The final deep result lies in showing that 
these finite groups determine properties about the roots. 

Now for some details. We assume that P(x) is irreducible in Q[z], 
meaning that P(x) is not the product of any polynomials in Q[{a]. Hence 
none of the roots a; of P(x) can be rational numbers. 


Definition 11.4.3 Let Q(ai,...,@n,) be the smallest subfield of C con- 
taining both Q and the roots aj,...,Qn. 


Definition 11.4.4 Let E be a field extension of Q but contained in C. 
We say E is a splitting field if there is a polynomial P(x) € Q[a] such that 
E=Q(aq1,...,Qn), where a1,...,Qn are the roots in C of P(x). 


A splitting field E over the rational numbers Q is in actual fact a vector 
space over Q. For example, the splitting field Q(./2) is a two-dimensional 
vector space, since any element can be written uniquely as a + bv2, with 
ab EQ. 


Definition 11.4.5 Let E be an extension field of Q. The group of auto- 
morphisms G of E over Q is the set of all field automorphisms o : E > E. 


By field automorphism we mean a ring homomorphism from the field Æ 
to itself that is one-to-one, onto, maps unit to unit and whose inverse is a 
ring homomorphism. Note that field automorphisms of an extension field 
have the property that each rational number is mapped to itself (this is an 
exercise at the end of the chapter). 

Such field automorphisms can be interpreted as linear transformations 
of E to itself. But not all linear transformations are field automorphisms, 
as will be seen in a moment. 

Of course, there is needed here, in a complete treatment, a lemma show- 
ing that this set of automorphisms actually forms a group. 


Definition 11.4.6 Given an extension field E over Q with group of au- 
tomorphisms G, the fixed field of G is the set {e € E : o(e) =e, for all 
o EG} 


Note that we are restricting attention to those field automorphisms that 
contain Q in the fixed field. Further it can be shown that the fixed field is 
indeed a subfield of E. 
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Definition 11.4.7 A field extension E of Q isnormal if the fixed field of 
the group of automorphisms G of E over Q is exactly Q. 


Let G be the group of automorphisms of Q(a1,...,a@,,) over Q, where 
Q(a1,..-,Qn) is the splitting field of the polynomial 


P(x) (z — a) (£ — a2)... (£ — an) 


T” +an-12”7! +... + ao, 


lI 


with each a, € Q. This group G is connected to the roots of the polynomial 
P(x), as seen in: 


Theorem 11.4.2 The group of automorphisms G is a subgroup of the per- 
mutation group Sn on n elements. It is represented by permuting the roots 
of the polynomial P(x). 


Sketch of Proof: We will show that for any automorphism ø in the 
group G, the image of every root a; is another root of P(z). Therefore the 
automorphisms will merely permute the n roots of P(x). It will be critical 
that o(a) = a for all rational numbers a. Now 


P(a(ai)) (olai) + an- (o(a)?! +--+ ao 
= o(a)” +o(lan-1(a:)" T!) +- +o(ao) 
= o((a;)” + anila)" +--+ a0) 

= øo(P(a:)) 

= oa(0) 

= 0. 


Thus o(a;) is another root. To finish the proof, which we will not do, we 
would need to show that an automorphism ø in G is completely determined 
by its action on the roots a. O 

All of this culminates in: 


Theorem 11.4.3 (Fundamental Theorem of Galois Theory) Let 
P(a) be an irreducible polynomial in Q[z] and let E = Q(a1,...,Qn) be its 
splitting field with G the automorphism group of E. 
i) Each field B containing Q and contained in E is the fired field of a 
subgroup of G. Denote this subgroup by GB. 
ii) The field extension B of Q is normal if and only if the subgroup G'g 
is a normal subgroup of G. 
iii) The rank of E as a vector space over B is the order of Gg. The 
rank of B as a vector space over Q is the order of the group G/Gep. 
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Unfortunately, in this brevity, none of the implications should be at all 
clear. It is not even apparent why this should be called the Fundamental 
Theorem of the subject. A brief hint or whisper of its importance is that 
it sets up a dictionary between field extensions B with Q C B C E and 
subgroups Gg of G. A see-saw type diagram would be 


E=Q(Q4,..., Ole) G 
U U 
E1 Ge, 
U U 
E2 GE, 
U U 
Q (e) 


Here the lines connect subgroups with the corresponding fixed fields. 

But what does this have to do with finding the roots of a polynomial. 
Our goal (which Galois Theory shows to be impossible) is to find an ana- 
logue of the quadratic equation. We need to make this more precise. 


Definition 11.4.8 A polynomial P(x) is solvable if its splitting field 
Q(aı,...,@n) lies in an extension field of Q obtained by adding radicals of 
integers. 


As an example, the field Q {3\/2,5V/7} is obtained from 3\/2 and 5V7, 
both of which are radicals. On the other hand, the field Q(z) is not obtained 
by adding radicals to Q; this is a rewording of the deep fact that 7 is 
transcendental. 

The quadratic equation z = abiv idee shows that each root of a sec- 
ond degree polynomial can be written in terms of a radical of its coefficients; 
hence every second degree polynomial is solvable. To show that no ana- 
logue of the quadratic equations exists for fifth degree or higher equations, 
all we need to show is that not all such polynomials are solvable. We want 
to describe this condition in terms of the polynomial’s group of automor- 
phisms. 


Definition 11.4.9 A finite group G is solvable if there is a nested sequence 
of subgroups G1,...,Gpn with G = Go D Gi D Go D... D Gn = (e), with 
each G; normal in Gi—-ı and each Gi—1/G; abelian. 


The link between writing roots as radicals and groups is contained in: 
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Theorem 11.4.4 A polynomial P(x) is solvable if and only tf its associated 
group G of automorphisms of its splitting field is solvable. 


The impossibility of finding a clean formula for the roots of a high degree 
polynomial in terms of radicals of the coefficients now follows from showing 
that generically the group of automorphisms of an nth degree polynomial 
is the full permutation group Sn and 


Theorem 11.4.5 The permutation group on n elements, Sn, is not solv- 
able whenever n is greater than or equal to five. 


Of course, these are not obvious theorems. An excellent source for the 
proofs is Artins’ Galois Theory [3]. 

Though there is no algebraic way of finding roots, there are many meth- 
ods to approximate the roots. This leads to many of the basic techniques 
in numerical analysis. 


11.5 Books 


Algebra books went through quite a transformation starting in the 1930s. 
It was then that Van der Waerden wrote his algebra book Modern Algebra 
[113], which was based on lectures of Emmy Noether. The first undergrad- 
uate text mirroring these changes was A Survey of Modern Algebra [9], by 
Garrett Birkhoff and Saunders Mac Lane. The undergraduate text of the 
sixties and seventies was Topics in Algebra by Herstein [57]. Current pop- 
ular choices are A First Course in Abstract Algebra by Fraleigh [41], and 
Contemporary Abstract Algebra by Gallian [43]. Serge Lang’s Algebra [79] 
has been for a long time a standard graduate text, though it is not the place 
to start learning algebra. You will find, in your mathematical career, that 
you will read many texts by Lang. Jacobson’s Basic Algebra [68], Artin’s 
Algebra [4] and Hungerford’s Algebra [65] are also good beginning graduate 
texts. 

Galois Theory is definitely one of the most beautiful subjects in math- 
ematics. Luckily there are a number of excellent undergraduate Galois 
Theory texts. One of the best (and cheapest) is Emil Artin’s Galois The- 
ory [3]. Other excellent texts are by Ian Stewart [107] and by Garling [45]. 
Edwards’ Galois Theory [31] is an historical development. For beginning 
representation theory, I would recommend Hill’s Groups and Characters 
[59] and Sternberg’s Group Theory and Physics [106]. 
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11.6 Exercises 


1. Fix a corner of this book as the origin (0,0,0) in space. Label one of 
the edges coming out of this corner as the z-axis, one as the y-axis and the 
last one as the z-axis. The goal of this exercise is to show that rotations do 
not commute. Let A denote the rotation of the book about the x-axis by 
ninety degrees and let B be the rotation about the y-axis by ninety degrees. 
Show with your book and by drawing pictures of your book that applying 
the rotation A and then rotation B is not the same as applying rotation B 
first and then rotation A. 

2. Prove that the kernel of a group homomorphism is a normal subgroup. 
3. Let R be a ring. Show that for all elements z in R, 


z-0=0-2=0, 


even if the ring R is not commutative. 
4. Let R be a ring and J an ideal in the ring. Show that R/I has a ring 
structure. (This is a long exercise, but it is an excellent way to nail down 
the basic definition of a ring.) 
5. Show that the splitting field Q(V2) over the rational numbers C is a 
two dimensional vector space over C. 
6. Start with the permutation group S3. 

a. Find all subgroups of S3. 

b. Show that the group S3 is solvable. (This allows us to conclude that 
for cubic polynomials there is an analogue of the quadratic equation.) 
7. For each of the six elements of the group 53, find the corresponding 
matrices for the representation of S3 as described in section two of this 
chapter. 
8. If H is a normal subgroup of a group G, show that there is a natural 
one-to-one correspondence between the left and the right cosets of H. 
9. Let & be a field containing the rational numbers Q. Let o be a field 
automorphism of Æ. Note that this implies in particular that o(1) = 1. 
Show that o() = ® for all rational numbers 2. 


10. Let T : G > G be a group homomorphism. Show that T(g7!) = 
(T(g))~* for all g € G. 

11. Let T : G + G be a group homomorphism. Show that the groups 
G/ker(T) and Im(T) are isomorphic. Here Im(TZ') denotes the image of 
the group G in the group G. This result is usually known as one of the 
Fundamental Homomorphism Theorems. 


Chapter 12 


Lebesgue Integration 


Basic Object: Measure Spaces 


Basic Map: Integrable Functions 
Basic Goal: Lebesgue Dominating Convergence Theorem 





In calculus we learn about the Riemann integral of a function, which cer- 
tainly works for many functions. Unfortunately, we must use the word 
‘many’. Lebesgue measure, and from this the Lebesgue integral, will allow 
us to define the right notion of integration. Not only will we be able to 
integrate far more functions with the Lebesgue integral but we will also 
understand when the integral of a limit of functions is equal to the limit of 


the integrals, i.e., when 
aim, [to f dim, tu 


which is the Lebesgue Dominating Convergence Theorem. In some sense, 
the Lebesgue integral is the one that the gods intended us to use all along. 

Our approach will be to develop the notion of Lebesgue measure for the 
real line R, then use this to define the Lebesgue integral. 


12.1 Lebesgue Measure 


The goal of this section is to define the Lebesgue measure of a set E of real 
numbers. This intuitively means we want to define the length of E. For 
intervals 

E = [ja,b] = {rER:a <z <b} 


the length of E is simply: 
&(£) =b-a. 
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length b-a 


a— ll 


a b 


The question is to determine the length of sets that are not intervals, such 
as 
E = {x € [0, 1] : x is a rational number}. 


We will heavily use that we already know the length of intervals. Let E 
be any subset of reals. A countable collection of intervals {I,}, with each 


In = lan, bn], 


covers the set E if 
EcUh. 


Ec I, U I,U J, 


Whatever the length or measure of E is, it must be less than the sum of 
the lengths of the In. 


Definition 12.1.1 For any set E in R, the outer measure of E is 
m*(E) = inf{)_ (bn—an) : The collection of intervals {[an, bn]} covers E}. 
Definition 12.1.2 A set E is measurable if for every set A, 
m*(A) = m* (AN E) + m* (A — E). 
The measure of a measurable set E, denoted by m(E), is m* (E). 
The reason for such a convoluted definition is that not all sets are mea- 
surable, though no one will ever construct a nonmeasurable set, since the 


existence of such a set requires the use of the Axiom of Choice, as we saw 
in Chapter Ten. 
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There is another method of defining a measurable set, via the notion of 
inner measure. Here we define the inner measure of a set E to be 


m,(E) = sup(S (bn —an): ED Ulm and Ip, = [an, bn] with an < bn}. 


Thus instead of covering the set E by a collection of open intervals, we fill 
up the inside of E with a collection of closed intervals. 
If m*(E) < oo, then the set E can be shown to be measurable if and 
only if 
m* (E) = m, (E). 


In either case, we now have a way of measuring the length of almost all 
subsets of the real numbers. 

As an example of how to use these definitions, we will show that the 
measure of the set of rational numbers (denoted here as E) between 0 
and 1 is zero. We will assume that this set E is measurable and show its 
outer measure is zero. It will be critical that the rationals are countable. 
In fact, using this countability, list the rationals between zero and one as 
Q1,@2,a3,.... Now choose an € > 0. Let J, be the interval 


€ € 
L = [a — po + 5! 

Note that €(1,) = e. Let 

€ 


h=(a- 7 


a2 + J 
3 42 4 . 
Here (I2) = £. Let 

€ 


€ 
Tz = [a3 —--,a3+ gl: 


8 
Here £(I3) = $. In general let 
€ € 
Ty = [ax — eek + zl 
Then (Ip) = gér. 
I, I; I; 


aë 4 ark a,-& 3a +E a,-£ 22 ath 


2 2 8 8 4 


Certainly the rationals between zero and one are covered by this countable 
collection of open sets : 
Ec Uh. 
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Then 
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By letting « approach zero, we see that m(E) = 0. 
A similar argument can be used to show that the measure of any count- 
able set is zero and in fact appears as an exercise at the end of this chapter. 


12.2 The Cantor Set 


While long a source of examples and counterexamples in real analysis, the 
Cantor set has recently been playing a significant role in dynamical systems. 
It is an uncountable, nowhere dense measure zero subset of the unit interval 
(0, 1]. By nowhere dense, we mean that the closure of the complement of 
the Cantor set will be the entire unit interval. We will first construct the 
Cantor set, then show that it is both uncountable and has measure zero. 

For each positive integer k, we will construct a subset Ch of the unit 
interval and then define the Cantor set C to be 


Cia (VC 
k=1 


For k = 1, split the unit interval [0,1] into thirds and remove the open 


middle third, setting 12 


C1 = [0,1] G5) 
= (0, UK 

s anasuonponnnnnan 

0 4 2 1 


C1 
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Take these two intervals and split them into thirds. Now remove each of 
their middle thirds to get 


C= (0, JU. JUG UŠ.. 


Co 


To get the next set C3, split each of the four intervals of C2 into three equal 
parts and remove the open middle thirds, to get eight closed intervals, each 
of length a Continue this process for each k, so that each Ch consists of 
2* closed intervals, each of length Æ. Thus the length of each Cp will be 


k 


2 
length = zE: 


The Cantor set C is the intersection of all of these Cy: 


Cantor set = C = a Cy. 


k=1 


Part of the initial interest in the Cantor set was it was both uncountable 
and had measure zero. We will show first that the Cantor set has measure 
zero and then that it is uncountable. Since C is the intersection of all of 
the Ck, we get for each k that 

ok 
m(C) < m(C,) = 3K 
Since the fractions 2 go to zero as k goes to infinity, we see that 


m(C) = 0. 


It takes a bit more work to show that the Cantor set is uncountable. 
The actual proof will come down to applying the trick of Cantor diagonal- 
ization, as discussed in Chapter Ten. The first step is to express any real 
number æ in the unit interval [0, 1] in its tri-adic expansion 


n 


k 
k? 


R 
(I 
Me 
w 


= 
{I 


1 
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where each n is zero, one or two. (This is the three-analog of the decimal 
expansion @ = Pg] iar» where here each ng = 0,1,...,9.) We can write 
the tri-adic expansion in base three notation, to get 


Q = NININZ... 


As with decimal expansion, the tri-adic expansion’s coefficients nę are 
unique, provided we always round up. Thus we will always say that 


.102222... = .11000... 


The Cantor set C has a particularly clean description in terms of the 
tri-adic or base three expansions. Namely 


C = {.nyngng...| each ng is either zero or two}. 


Thus the effect of removing the middle thirds from all of the intervals 
corresponds to allowing no 1’s among the coefficients. But then the Cantor 
set can be viewed as the set of infinite sequences of 0’s and 2’s, which was. 
shown to be uncountable in the exercises of Chapter Ten. 


12.3. Lebesgue Integration 


One way to motivate integration is to try to find the area under curves. The 
Lebesgue integral will allow us to find the areas under some quite strange 
curves. 

By definition the area of a unit square is one. 





area ab 
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Let E be a measurable set on R. Recall that the characteristic function of 
E, xz, is defined by 


ayo { 1 ifter 
XEN =) 9 ifteR-—E 


X12) 





Since the height of xg is one, the area under the function (or curve) yz 
must be the length of E, or more precisely, m(E). We denote this by J, xz. 
Then the area under the function a- xg must be a- m(E), 


a aXe 


area am(E) 


which we denote by fp axe- 
Now let E and F be disjoint measurable sets. Then the area under the 
curve a: xg +b: xr must be a: m(E) +b- m(F), 


total area = aæm(E) + bem(F) 





denoted by 


axe +bxr =a: m(E) +b- m(F). 
EUF 
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For a countable collection of disjoint measurable sets A;, the function 


> QiXAi 
is called a step function. Let E be a measurable set. Let 
Ņ aixi 
be a step function. Then define 
I S aiXAi) = ` a;m( A: N E). 
E 


We are about ready to define fp f- 


Definition 12.3.1 A function f : E > RU (œ) U (—c0) is measurable if 
its domain E is measurable and if, for any fized a € R. U (00) U (—o0), 


{x € E: f(x) =a} 
is measurable. 


Definition 12.3.2 Let f be a measurable function on E. Then the Lebesgue 
integral of f on E is 


A = int | Saxe, : for all z € E, axa: (£) > f(£)} 


In pictures: 






Ai 





A2 A3 A4 As Ag 


E 


Thus we use that we know the integral for single step functions and then 
approximate the desired integral by summing the integrals of these step 
functions. 
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Every function that is integrable in beginning calculus is Lebesgue in- 
tegrable. The converse is false, with the canonical counterexample given 
by the function f : [0,1] — [0,1] which is one at every rational and zero at 
every irrational. The Lebesgue integral is 


i. f =T 0, 
[0,1] 


which is one of the exercises at the end of the chapter, but this function 
has no Riemann integral, which is an exercise in Chapter Two. 


12.4 Convergence Theorems 


Not only does the Lebesgue integral allow us to integrate more functions 
than the calculus class (Riemann) integral, it also provides the right con- 
ditions to judge when we can conclude that 


/ lim fx, = lim Rs 
k-> 00 k->00 
In fact, if such a result were not true, we would have chosen another defi- 


nition for the integral. 
The typical theorem is of the form: 


Theorem 12.4.1 (Lebesgue Dominating Convergence Thm.) Let 
g(a) be a Lebesgue integrable function on a measurable set E and let { fn(x)} 
be a sequence of Lebesgue integrable functions on E with |fx(x)| < g(x) for 
all x in E and such that there is a pointwise limit of the f,(a), i.e., there 
is a function f(x) with 


f(z) = lim f(z). 


[fim falo) = iim f fo 
For a proof, see Royden’s Real Analysis [95], Chapter 4, in section 4. We 


will just give a sketch here. Recall that if f(x) converges uniformly to 
f(x), then we know from « and 6 real analysis that 


jim | tle) = fto) 


(i.e., the sequence of functions f,(x) converges uniformly to f(z) if given 
any € > 0, there exists a positive integer N with 


\f(z) — fr(z)| <€, 


Then 
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for all z and all k > N. More quaintly, if we put an e—tube around y = f(z), 
eventually the y = f(x) will fall inside this tube.) The idea in the proof is 
that the f(x) will indeed converge uniformly to f(x), but only away from 
a subset of E of arbitrarily small measure. More precisely, the proposition 
we need is: 


Proposition 12.4.1 Let {fn(x)} be a sequence of measurable functions 
on a Lebesgue measurable set E, with m(E) < co. Suppose that { fn(x)} 
converges pointwise to a function f(a). Then given e > 0 and 6 > 0, there 
is a positive integer N and a measurable set A C E with | f(x) — f(a) |< 
for allx€ E—A andk >WN and m(A) <6. 


The basic idea of the proof of the original theorem is now that 


J lm f, = I lim far f lim fn 
pn E-A" A n=O 


lim Ín + maz | g(x) | m(A). 
A 


n=>œ Jp 


Since we can choose our set A to have arbitrarily small measure, we can 
let m(A) — 0, which gives us our result. 

The proposition can be seen to be true from the following. (After Roy- 
den’s proof in Chapter 3, Section 6.) Set 


Gn = {z € E:| fr(z) — f(z) |> €}- 
Set 


En = |) Ga = {2 € E :| falz) — f(z) [> en > N}- 
n=N 


Then Envi: C Eyn. Since we have fx(x) converging pointwise to f(z), 


we must have NE», which can be thought of as the limit of the sets Ey, 
be empty. For measure to have any natural meaning, it should be true 
that limyom(En) = 0. Thus given 6 > 0, we can find an En with 


m(En) <ô. 

This is just an example of what can be accomplished with Lebesgue 
integration. Historically, the development of the Lebesgue integral in the 
early part of the twentieth century led quickly to many major advances. 
For example, until the 1920s, probability theory had no rigorous founda- 
tions. With the Lebesgue integral, and thus a correct way of measuring, 
the foundations were quickly laid. 
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12.5 Books 


One of the first texts on measure theory was by Halmos [54]. This is still an 
excellent book. The book that I learned measure theory from was Royden’s 
[95] and has been a standard since the 1960s. Rudin’s book [96] is another 
excellent text. Frank Jones, one of the best teachers of mathematics in the 
country, has recently written a fine text [70]. Folland’s recent text [40] is 
also quite good. 


12.6 Exercises 


1. Let E be any countable set of real numbers. Show that m(E) = 0. 
2. Let f(x) and g(x) be two Lebesgue integrable functions, both with 
domain the set Æ. Suppose that the set 


A= {r€ E: f(x) # 9(z)} 


has measure zero. What can be said about fp f(z) and f, g(x)? 
3. Let f(x) = x for all real numbers x between zero and one and let f(z) 
be zero everywhere else. We know from calculus that 


| f(z)dz = > 


Show that this function f(x) is Lebesgue integrable and that its Lebesgue 
integral is still 4. 
4. On the interval [0,1], define 


He 1 ifz is rational 
— \ 0 ifzis not rational * 


Show that f(x) is Lebesgue integrable, with 


T f(z)dz =0. 


Chapter 13 


Fourier Analysis 






Basic Object: Real-valued functions with a fixed period 
Basic Maps: Fourier transforms 
Basic Goal: Finding bases for vector spaces of periodic functions 






13.1 Waves, Periodic Functions and 
Trigonometry 


Waves occur throughout nature, from water pounding a beach to sound 
echoing off the walls at a club to the evolution of an electron’s state in 
quantum mechanics. For these reasons, at the least, the mathematics of 
waves is important. In actual fact, the mathematical tools developed for 
waves, namely Fourier series (or harmonic analysis), touch on a tremen- 
dous number of different fields of mathematics. We will concentrate on 
only a small sliver and look at the basic definitions, how Hilbert spaces 
enter the scene, what a Fourier transform looks like and finally how Fourier 


transforms can be used to help solve differential equations. 
Of course, a wave should look like: 





or 


244 CHAPTER 13. FOURIER ANALYSIS 





Both of these curves are described by periodic functions. 


Definition 13.1.1 A function f : R — R is periodic with period L if for 
all x, f(x + L) = f(z). 


In other words, every L units the function must start to repeat itself. The 
quintessential periodic functions are the trigonometric functions cos(x) and 
sin(z), each with period 27. Of course, functions like cos(#%*) and sin(2%2) 
are also periodic, both with period L. 

Frequently people will say that a function f(z) has period L if not only 
do we have that f(x + L) = f(x), but also that there is no smaller number 
than L for which f(x) is periodic. According to this convention, cos(x) 
will have period 27 but not period 4r, despite the fact that, for all z, 
cos(x + 4r) = cos(z). We will not follow this convention. 

The central result in beginning Fourier series is that almost every peri- 
odic function is the, possibly infinite, sum of these trigonometric functions. 
Thus, at some level, the various functions cos(#%2) and sin(2%*) are not 
merely examples of periodic functions; they generate all periodic functions. 


13.2 Fourier Series 


Now to see how we can write a periodic function as an (infinite) sum of these 
cosines and sines. First suppose that we have a function f : [~-r, r] > R 
that has already been written as a series of sines and cosines, namely as 


ao + S la cos(nz) + bp sin(nz)). 


n=l 


We want to see how we can naively compute the various coefficients ag and 
bk, ignoring all questions of convergence for these infinite series (convergence 
issues are faced in the next section). For any given k, consider 


K f(x) cos(kx)dz = / ' (ao + S > (ancos(nz) + bysin(nz))) cos(kx)dx 


n=1 
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= i ay cos(kx)da 


-=f 


o cos(nx) cos(kz)dg 
i sin(nz) cos(ka)da 


By direct calculation we have 


A 27 ifk=0 
f. colto = ee if k £0 


a ifk=n 
f cos(nx) cos(ka)da { 5 ie ee 


= 


i sin(nx) cos(ka)da 0. 


g: 


Then we would expect 


- _ | 2ra ifk=0 
ed f(x) cos(kz)dz = í an ath 


By asimilar calculation, using, though, the integrals SEn f(x) sin(na)dz, 


we can get similar formulas for the bn. This suggests how we could try to 
write any random periodic function as the infinite sum of sines and cosines: 


Definition 13.2.1 The Fourier series for a function f : [-a,2] > R. is 


ao + > (an cos(nz) + bp sin(na)) 


n=1 
where 
1 f” d 
z Ja) 
and ici 
in f(x) cos(nx)da 
and 


1 T 
bn = — f(x) sin(nz)dza. 
T =F 
The coefficients a; and b; are called the amplitudes, or Fourier coefficients 
for the Fourier series. 
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Of course, such a definition can only be applied to those functions for 
which the above integrals exist. The punchline, as we will see, is that most 
functions are actually equal to their Fourier series. 

There are other ways of writing the Fourier series for a function. For 
example, using that ef? = cosg + ising, for real numbers x, the Fourier 
series can also be expressed by 


oo 
ing 
> Cre, 
n=—0o 


where E 


Cn = = Me f(x)e"* dz. 
The Chn are also called the amplitudes or Fourier coefficients. In fact, for 
the rest of this section, but not for the rest of the chapter, we will write 
our Fourier series as Jp _ o Cne” 

The hope (which can be almost achieved) is that the function f(x) and 
its Fourier series will be equal. For this, we must first put a slight restriction 
on the type of function we allow. 


Theorem 13.2.1 Let f : [-r, nr] —> R be a square-integrable function. 
(i.e., 


f OPa <o) 


E 


Then at almost all points, 


f(a) = y Cre”, 


n=— 00 
its Fourier series. 


Note that this theorem contains within it the fact that the Fourier series 
of a square-integrable function will converge. Further, the above integral is 
the Lebesgue integral. Recall that almost everywhere means at all points 
except possibly for points in a set of measure zero. As seen in exercise 2 in 
Chapter Twelve, two functions that are equal almost everywhere will have 
equal integrals. Thus, morally, a square-integrable function is equal to its 
Fourier series. 

What the Fourier series does is associate to a function an infinite se- 
quence of numbers, the amplitudes. It explicitly gives how a function is 
the (infinite) sum of complex waves e?””. Thus there is a map & from 
square-integrable functions to infinite sequences of complex numbers, 
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Certain vector space of in- 
> finite sequences of com- 
plex numbers 


œ. Vector Space of square- 
` integrable functions 


or 


œ. Vector Space of square- Vector space of infinite se- 
Fe ; > : 
integrable functions quences of amplitudes 


which, by the above theorem, is one-to-one, modulo equivalence of functions 
almost everywhere. 

We now translate these statements into the language of Hilbert spaces, 
an extremely important class of vector space. Before giving the definition 
of a Hilbert space, a few definitions must be made. S 


Definition 13.2.2 An inner product (-,-) : V xV — C on a complex 
vector space V is a map such that 


1. (avı + buz, v3} = a(vi, v3) + b(v2, v3) for all complex numbers a,b € C 
and for all vectors v1, V2, U3, € V. 


2. (vu, w) = (w,v) for all v,w E€ V. 


3. (v,v) > 0 for allvu E€ V and (v, v) = 0 only ifv =0. 


Note that since (v, v} = (v, v}, we must have, for all vectors v, that (v, v) 
is a real number. Hence the third requirement that (v, v} > 0 makes sense. 

To some extent, this is the complex vector space analogue of the dot 
product on R”. In fact, the basic example of an inner product on C” is 
the following: let 


(v1, .++5 Un) 
(wy,...,Wn) 


v 


w 


be two vectors in C”. Define 


n 
(v,w) = XO Tr. 
k=1 
It can be checked that this is an inner product on C”. 


Definition 13.2.3 Given an inner product (-,-): V x V + C, the induced 
norm on V is given by: 
|v] = (v, 0)? 
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In an inner product space, two vectors are orthogonal if their inner product 
is zero (which is what happens for the dot product in R”). Further, we 
can interpret the norm of a vector as a measure of the distance from the 
vector to the origin of the vector space. But then, with a notion of distance, 
we have a metric and hence a topology on V, as seen in Chapter Four, by 
setting 


plv, w) = |v — w]. 


Definition 13.2.4 A metric space (X,p) is complete if every Cauchy se- 
quence converges, meaning that for any sequence {v;} in X with p(v;,v;) > 
0 as i,j - 00, there is an element v in X with v; > v (i.e, plv, vi) > 0 
as i => 00). 


Definition 13.2.5 A Hilbert space is an inner product space which is com- 
plete with respect to the topology defined by the inner product. 


There is the following natural Hilbert space. 
Proposition 13.2.1 The set of Lebesgue square-integrable functions 
L?[—7, 2] = {f : [-r, r] >C | i |f|? < co} 
is a Hilbert space, with inner product 
(9) = [ FOr 
This vector space is denoted by L?{—7, 7]. 


We need to allow Lebesgue integrable functions in the above definition in 
order for the space to be complete. 

In general, there is, for each real number p > 1 and any interval [a,b], 
the vector space: 


b 
Llad] = {f lat] >R] f Pde < 00}. 


The study of these vector spaces is the start of Banach Space theory. 
Another standard example of a Hilbert space is the space of square- 
integrable sequences, denoted by 1°: 


Proposition 13.2.2 The set of sequences of complex numbers 


P= {(a@o, @1,. 5 Yer < co} 
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is a Hilbert space with inner product 
ica — 
((ao, @1, oe ay (bo, bi, . D) = 5 ajbi. 
j=0 


We can now restate the fact that square-integrable functions are equal to 
their Fourier series, almost everywhere, into the language of Hilbert spaces. 


Theorem 13.2.2 For the Hilbert space L?[—7, m], the functions 


are an orthonormal (Schauder) basis, meaning that each has length one, 
that they are pairwise orthogonal and that each element of L?[—7, 7] is the 
unique infinite linear combination of the basis elements. 


Note that we had to use the technical term of Schauder basis. These are not 
quite the bases defined in Chapter One. There we needed each element in 
the vector space to be a unique finite linear combination of basis elements. 
While such do exist for Hilbert spaces, they do not seem to be of much 
use (the proof of their existence actually stems from the Axiom of Choice). 
The more natural bases are the above, for which we still require uniqueness 
of the coefficients but now allow infinite sums. 

While the proof that the functions ae are orthonormal is simply 
an integral calculation, the proof that they form a basis is much harder 
and is in fact a restatement that a square-integrable function is equal to its 
Fourier series, namely: 


Theorem 13.2.3 For any function f(x) in the Hilbert space L?[—1, m], we 
have 


i 1. 1. 
2) = x paige alee 
flo) = So Ula) aoe) ee 
almost everywhere. 


Hence, the coefficients of a function’s Fourier series are simply the inner 
product of f(z) with each basis vector, exactly as with the dot product 


1 0 
for vectors in R with respect to the standard basis | 0 |,[ 1] and 
0 0 


0 
O | . Further, we can view the association of a function with its Fourier 
1 
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1 


coefficients (with its amplitudes) as a linear transformation 
L?[-2, 2] > 2. 


Naturally enough, these formulas and theorems have versions for func- 
tions with period 2L, when the Fourier series will be: 


Definition 13.2.6 A function f :[-L,L] > R has Fourier series 


oo 
S GE 
ot 


n=—CO 


where 


inne 


1 re 
Cr= az f flee 2 dg. 





We have ignored, so far, a major subtlety, namely that a Fourier series 
is an infinite series. The next section deals with these issues. 


13.3 Convergence Issues 


Already during the 1700s mathematicians were trying to see if a given 
function was equal to its Fourier series, though in actual fact the theoretical 
tools needed to talk about such questions were not yet available, leading to 
some nonsensical statements. By the end of the 1800s, building on work of 
Dirichlet, Riemann and Gibbs, much more was known. 

This section will state some of these convergence theorems. The proofs 
are hard. For notation, let our function be f(x) and denote its Fourier 
series by 


oO 
ao + X (an cos(nz) + bn sin(nz)). 
n=1 
We want to know what this series converges to pointwise and to know when 
the convergence is uniform. 


Theorem 13.3.1 Let f(x) be continuous and periodic with period 2x. Then 


N 


Nim, L (f(z) — [ao + > (az cos(nx) + bp sin(nz))])da = 0. 


n=l 
Thus for continuous functions, the area under the curve 


y = partial sum of the Fourier series 
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will approach the area under the curve y = f(x). We say that the Fourier 
series converges in the mean to the function f(z). 

This is telling us little about what the Fourier series converges to at 
any given fixed point x. Now assume that f(x) is piecewise smooth on the 
closed interval [—7,z], meaning that f(x) is piecewise continuous, has a 
derivative at all but a finite number of points and that the derivative is 
piecewise continuous. For such functions, we define the one sided limits 


f(zt+) = lim f(z +h) 
ho and A>o 


and 
f(z-) = lim f(z —h). 
ho and a>o 


Theorem 13.3.2 If f(x) is piecewise smooth on [—n, 7], then for all points 
z, the Fourier series converges pointwise to the function 


f(z+) + f(z-) 
> ; 


At points where f(x) is continuous, the one sided limits are of course 
each equal to f(x). Thus for a continuous, piecewise smooth function, 
the Fourier series will converge pointwise to the function. 


(x) 
a 


* 


But when f is not continuous, even if it is piecewise smooth, the above 
pointwise convergence is far from uniform. Here the Gibbs’ phenomenon 
becomes relevant. Denote the partial sum of the Fourier series by 


N 
Sn(a) = = + X (an cos(nx) + by sin(nz)) 


n=l 


and suppose that f has a point of discontinuity at xo. While the partial 
sums Sy (x) do converge to Lette), the rate of convergence at different 
zx is wildly different. In fact, the better the convergence is at the point of 
discontinuity xo, the worse it is near xo. In pictures, what happens is: 
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Sx (x) 
f(x) 


Ska (X) 


f(x) 


Note how the partial sums soar away from the function f(x), destroying 
any hope of uniform convergence. 

Luckily this does not happen if the function is continuous and piecewise 
smooth. 


Theorem 13.3.3 Let f(x) be continuous and piecewise smooth on [—7,7], 
with f(—m) = f(a). Then the Fourier series will converge uniformly to f(z). 


Thus for reasonably decent functions, we can safely substitute their Fourier 
series and still do basic calculus. 

For proofs of these results, see Harry F. Davis’ Fourier Series and Or- 
thogonal Functions [24], chapter 3. 


13.4 Fourier Integrals and Transforms 
Most functions f : R — R. will of course not be periodic, no matter what 


period LZ is chosen. But all functions, in some sense, are infinitely periodic. 
The Fourier integral is the result when we let the period L approach infinity 
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(having as a consequence that “#* approaches zero). The summation sign 
in the Fourier series becomes an Sareal. The result is: 


Definition 13.4.1 Let f: R — R be a function. Its Fourier integral is 


[ew cos(tx) + b(t) sin(tx))dé 
0 


where 


a(t) = +f f(x) cos(ta)d 


and 


b(t) = =f f (x) sin(tx)da 


The Fourier integral can be rewritten as 


oo * 
i C(t)e** dt, 


des oper ; 
=r / f(a)e"* dz. 


where 


There are other forms, all equivalent up to constants. 
The main theorem is: 


Theorem 13.4.1 Let f : R —> R be integrable (i.e., [°. |f(a)|dx < 00). 
Then, off of a set of measure zero, the function f(x) is equal to its Fourier 
integral. 


As with Fourier series, this integral is the Lebesgue integral. Further, again 
recall that by the term ‘a set of measure zero’, we mean a set of Lebesgue 
measure zero and that throughout analysis, sets of measure zero are rou- 
tinely ignored. 

As we will see, a large part of the usefulness of Fourier integrals lies in 
the existence of the Fourier transform. 


Definition 13.4.2 The Fourier transform of an integrable function f(x) 
is: 


S(f(z))(t) = T f(z) * dz. 
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The idea is that the Fourier transform can be viewed as corresponding to 
the coefficients an and bn of a Fourier series and hence to the amplitude of 
the wave. By a calculation, we see that 


oO 
roz | SEH at, 
2T J aos 

provided we place suitable restrictions on the function f(z). Thus indeed 
the Fourier transform is the continuous analogue of the amplitudes for 
Fourier series, in that we are writing the original function f(x) as a sum (an 
integral) of the complex waves e** with coefficients given by the transform. 
(Also, the constant = is not fixed in stone; what is required is that the 
product of the constants in front of the integral in the Fourier transform 
(here it is 1) and the above integral be equal to a) 

As we will see in the next section, in applications you frequently know 
the Fourier transform before you know the original function. 


But for now we can view the Fourier transform as a one-to-one map 
¥ : Vector Space of Functions -> Different Vector Space of Functions. 
Thinking of the Fourier transform as an amplitude, we can rewrite this as: 
S : Position Space — Amplitude Space. 


Following directly from the linearity of the Lebesgue integral, this map is 
linear. 

Much of the power of Fourier transforms is that there is a dictionary 
between the algebraic and analytic properties of the functions in one of 
these vector spaces with those of the other vector space. 


Proposition 13.4.1 Let f(x,t) be an integrable function with f(x,t) + 0 
as © —> too. Let S(f(x))(u) denote the Fourier transform with respect to 
the variable x. Then 

i) S{5£}(u) = iuS(F(x))(w). 

2 

ii) S{F3 }(u) = —wS(F(z))(u). 

iii) HHE U) = {SU (e,t) zu). 
We will show (i), where the key tool is simply integration by parts and 


sketch the proof of (iii). 
By the definition of the Fourier transform, we have 


ð 0 
oyu = f Heras, 
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which, by integration by parts, is 
99 ` oo a 
e ™ f(x, t) |&, tiu f f(x, tje dr = iu f f(x, tje" de, 
Oo 00 
since f (x,t) —> 0 as z > +o, and hence equals 
iuS(f). 
For (iii), we have 


Since this integral is with respect to x and since the partial derivative is 
with respect to t, this is equal to: 


ð k —iuz 
xf. f(a, te" da. 


But this is just: j 
SASU, d) u), 


and thus (iii) has been shown. O 

In the next section we will use this proposition to reduce the solving of a 
partial differential equation to the solving of an ordinary differential equa- 
tion (which can almost always be solved). We need one more preliminary 
definition. 


Definition 13.4.3 The convolution of two functions f(x) and g(x) is 


(Fag) = | T oe 


By a direct calculation, the Fourier transform of a convolution is the prod- 
uct of the Fourier transforms of each function, i.e., 


S(f *g) = S(f) - (9). 


Thus the Fourier transform translates a convolution in the original vec- 
tor space into a product in the image vector space. This will be important 
when trying to solve partial differential equations, in that at some stage 
we will have the product of two Fourier transforms, which we can now 
recognize as the Fourier transform of a single function, the convolution. 
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13.5 Solving Differential Equations 


The idea is that the Fourier transform will translate a differential equation 
into a simpler one (one that is, vaguely, more algebraic). We will apply 
this technique to solving the partial differential equation that describes the 
flow of heat. Here the Fourier transform will change the partial differential 
equation into an ordinary differential equation, which can be solved. Once 
we know the Fourier transform, we can almost always recover the original 
function. 

In the next chapter, we will derive the heat equation, but for now we 
will take as a given that the flow of heat through an infinitely thin, long 
bar is described by 


dh Oh 

at “Oa” 
where h(x, t) denotes the temperature at time t and position z and where c 
is a given constant. We start with an initial temperature distribution f(z). 
Thus we want to find a function A(z,t) that satisfies 


an _ oh 
Ot Oa?’ 
given the initial condition, 
h(x, 0) = f(z). 


Further, assume that as 2 — -too, we know that f(x) > 0. This just 
means basically that the bar will initially have zero temperature for large 
values of z. For physical reasons we assume that whatever is the eventual 
solution A(z, t), we have that h(x, t) — 0 as z > oo. 

Take the Fourier transform with respect to the variable x of the partial 
differential equation 





Oh _ 4, Hh 
Ot ss Ox?’ 
to get ‘ 
Oh(z, t) O°h(z, t) 
3 = NE 
Eu) = Ak- SS lu), 
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yielding 
È S (hle, lu) = -kS (hle, u). 


Now S(h(zx,t))(u) is a function of the variables u and t. The x is a mere 
symbol, a ghost reminding us of the original PDE. 

Treat the variable u as a constant, which is of course what we are doing 
when we take the partial derivative with respect to t. Then we can write 
the above equation in the form of an ODE: 


£3(h(2,t))(u) = -kS (hle, Du). 


The solution to this ODE, as will be discussed in the next section but which 
can also be seen directly by (unpleasant) inspection, is: 


S(A(x,t))(u) = C(uje*™"*, 


where C(u) is a function of the variable u alone and hence, as far as the 
variable t is concerned, is a constant. We will first find this C(u) by using 
the initial temperature f(z). We know that h(#,0) = f(x). Then fort = 0, 


S(h(x, 0))(u) = S(F(z))(u). 


When t = 0, the function Clu)e~*u"t is just C(u) alone. Thus when t = 0, 
we have 


S(F(2))(u) = C(u). 


Since f(a) is assumed to be known, we can actually compute its Fourier 
transform and thus we can compute C(u). Thus 


S(h(z, t)) (u) = S(f (2) (u) e7. 


Assume for a moment that we know a function g(x,t) such that its 
Fourier transform with respect to x is: 


B(g(#, t))(u) =e, 
If such a function g(z,t) exists, then 
S(h(a, t))(u) = S(F(#))(u) - S(g(a, #))(u). 


But a product of two Fourier transforms can be written as the Fourier 
transform of a convolution. Thus 


S(h(x, t))(u) = S(F(z) * g(z, t)). 
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Since we can recover that original function from its Fourier transform, this 
means that the solution to the heat equation is 


h(z,t) = f(x) x g(x, t). 


Thus we can solve the heat equation if we can find this function g(x, t) whose 
Fourier transform is e~*“"*. Luckily we are not the first people to attempt 
this approach. Over the years many such calculations have been done and 
tables have been prepared, listing such functions. (To do it oneself, one 
needs to define the notion of the inverse Fourier transform and then to take 
the inverse Fourier transform of the function ew hut, while no harder than 
the Fourier transform, we will not do it.) However it is done, we can figure 
out that 1 


3 e TkT =e 
ae 


Thus the solution of the heat equation will be: 


h(x, t) = f(x) * ae 


—ku*t 








13.6 Books 


Since Fourier analysis has applications ranging from CAT scans to ques- 
tions about the distribution of the prime numbers, it is not surprising that 
there are books on Fourier series aimed at wildly different audiences and 
levels of mathematical maturity. Barbara Hubbard’s The World Accord- 
ing to Wavelets [63] is excellent. The first half is a gripping nontechnical 
description of Fourier series. The second half deals with the rigorous math- 
ematics. Wavelets, by the way, are a recent innovation in Fourier series that 
have had profound practical applications. A solid, traditional introduction 
is given by Davis in his Fourier Series and Orthogonal Functions [24]. A 
slightly more advanced text is Folland’s Fourier Analysis and its Applica- 
tions [38]. A brief, interesting book is Seeley’s An Introduction to Fourier 
Series and Integrals [98]. An old fashioned but readable book is Jackson’s 
Fourier Series and Orthogonal Polynomials [67]. For the hardcore student, 
the classic inspiration in the subject since the 1930s has been Zygmund’s 
Trigonometric Series [116]. 


13.7 Exercises 


1. On the vector space 


Lra] = {f : [ma] > C f I < oo), 
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show that 


fa) = [ He) Iar 


is indeed an inner product, as claimed in this chapter. 
2. Using Fourier transforms, reduce the solution of the wave equation 


Sy OU 
di T Da? 


with k a constant, to solving an ordinary (no partial derivatives involved) 
differential equation. 
3. Consider the functions 
Qn if—-<a<t 
= n n 
falz) { 0 otherwise 


Compute the Fourier transforms of each of the functions f,(z). Graph each 
of the functions f, and each of the Fourier transforms. Compare the graphs 
and draw conclusions. 


Chapter 14 


Differential Equations 


Basic Object: Differential Equations 


Basic Goal: Finding Solutions to Differential Equations 





14.1 Basics 


A differential equation is simply an equation, or a set of equations, whose 
unknowns are functions which must satisfy (or solve) an equation involving 
both the function and its derivatives. Thus 


dy _ 

dz 

is a differential equation whose unknown is the function y(x). Likewise, 
Py Oy | dy 


eae" sammie a 3 
dn? ~ Bedi’ Bn? U 


is a differential equation with the unknown being the function of two vari- 
ables y(z,t). Differential equations fall into two broad classes: ordinary 
and partial. Ordinary differential equations (ODEs) are those for which the 
unknown functions are functions of only one independent variable. Thus 
ou = 3y and 


3y 





d? d 
a2 + rs + sin(x)y = 0 
are both ordinary differential equations. As will be seen in the next section, 
these almost always have, in principle, solutions. 
Partial differential equations (PDEs) have unknowns that are functions 


of more than one variable, such as 
3y 8y 


g g 
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and R F 
ce + (4) = cos(xt). 
Here the unknown is the function of two variables y(z, t). For PDEs, every- 
thing is much murkier as far as solutions go. We will discuss the method 
of separation of variables and the method of clever change of variables (if 
this can be even called a method). A third method, discussed in Chapter 
Thirteen, is to use Fourier transforms. 

There is another broad split in differential equations: linear and non- 
linear. A differential equation is homogeneous linear if given two solutions 
fi and fo and any two numbers A, and A2, then the function 


Aifi + A2fe 


is another solution. Thus the solutions will form a vector space. For ex- 
ample, oy = oy = 0 is homogeneous linear. The differential equation is 
linear if by subtracting off from the differential equation a function of the 
independent variables alone changes it into a homogeneous linear differen- 
tial equation. The equation oy — oy = g is linear, since if we subtract 
off the function « we have a homogeneous linear equation. The important 
fact about linear differential equations is that their solution spaces form 
linear subspaces of vector spaces, allowing linear algebraic ideas to be ap- 
plied. Naturally enough a nonlinear differential equation is one which is 
not linear. 

In practice, one expects to have differential equations arise whenever 
one quantity varies with respect to another. Certainly the basic laws of 
physics are written in terms of differential equations. After all, Newton’s 
second law: 


Force = (mass) - (acceleration) 


is the differential equation 


2 oye 
Force = (mass) - (Cee) : 


14.2 Ordinary Differential Equations 


In solving an ordinary differential equation, one must basically undo a 
derivative. Hence solving an ordinary differential equation is basically the 
same as performing an integral. In fact, the same types of problems occur 
in ODEs and in integration theory. 

Most reasonable functions (such as continuous functions) can be inte- 
grated. But to actually recognize the integral of a function as some other, 
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well-known function (such as a polynomial, trig function, inverse trig func- 
tion, exponential or log) is usually not possible. Likewise with ODEs, while 
almost all have solutions, only a handful can be solved cleanly and explicitly. 
Hence the standard sophomore-level engineering-type ODE course must in- 
herently have the feel of a bag of tricks applied to special equations.! 

In this section we are concerned with the fact that ODEs have solutions 
and that, subject to natural initial conditions, the solutions will be unique. 
We first see how the solution to a single ODE can be reduced to solving a 
system of first order ODEs, which are equations with unknown functions 
y1(Z),-.-5Yn(x) satisfying 


d 
a = Fil@, Y1s-++5 Yn) 
d 
a =, fn(@,Y15+-+5 Yn) 


Start with a differential equation of the form: 
d"y dy = 
an(s) zn +... +a (2) F ao(x)y(x) + b(z) = 0. 


We introduce new variables: 











yor) = y(z) 


Then a solution y(x) to the original ODE will give rise to a solution of the 
following system of first order ODEs: 


dyo _ 
aes. = 1 
di 
dz Y2 


l'There are reasons and patterns structuring the bag of tricks. These involve a 
careful study of the underlying symmetries of the equations. For more, see Peter 
Olver’s Applications of Lie Groups to Differential Equations [90]. 
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Hea = -zg re) + an—2(2)Yn—2 + - .. + ao(z)yo + b(z)). 





If we can solve all such systems of first order ODEs, we can then solve 
all ODEs. Hence the existence and uniqueness theorems for ODEs can be 
couched in the language of systems of first order ODEs. 

First to define the special class of functions we are interested in. 


Definition 14.2.1 A function f(x,y1,...,Yn) defined on a region T in 
R”*! is Lipschitz if it is continuous and if there is a constant N such that 
for every (@,Y1,---;Yn) and (2,91,-.--,Yn) inT, we have 


IF (t,425+++5Un) — F(B,Hi,--- Hn) <N. (yı —Hil+...+ Yn — nl). 


It is not a major restriction on a function to require it to be Lipschitz. For 
example, any function with continuous first partial derivatives on an open 
set will be Lipschitz on any connected compact subset. 


Theorem 14.2.1 A system of first order ordinary differential equations 


d 
a filz,yi,---,Yn) 


dyn 
T = falz, Y1,- -3 Yn) 


with each function fi,..., fn being Lipschitz in a region T, will have, for 
each real number xo, an interval (xp —€, zo +€) on which there are solutions 
yi(Z),-.-,Yn(x). Further, given numbers aj,...,Gn, with (2p, @1,...,Gn) in 
the region T, the solutions satisfying the initial conditions 


y1 (Zo) 


ay 


Yn (xo) = an 
are unique. 
Consider a system of two first order ODEs: 


d 
a = file, 91, y2) 


d 
+ = fo(x,y1,Y2)- 
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Then a solution (y;(x), yo(z)) will be a curve in the plane R?. The theorem 
states that there is exactly one solution curve passing through any given 
point (a;,@2). In some sense the reason why ODEs are easier to solve 
than PDEs is that we are trying to find solution curves for ODEs (a one- 
dimensional type problem) while for PDEs the solution sets will have higher 
dimensions and hence far more complicated geometries. 

We will set up the Picard Iteration for finding solutions and then briefly 
describe why this iteration actually works in solving the differential equa- 
tions. 

For this iterative process, functions yi, (x),...,Yn, (x) will be constructed 
that will approach the true solutions y;(z),...,yn(z). Start with setting 

Yio (T) = Q; 


for each i. Then, at the kt? step, define 


mi(a) =a + [ "ee esate kd 


Iml) =at | iea 


The crucial part of the theorem is that each of these converges to a solution. 
The method is to look at the sequence, for each i, 


oo 
Vig (@ )+ 5 Yis (z — Yik- (x)), 
k=1 


which has as its Nt? partial sum the function y;, (x). To show that this 
sequence converges comes down to showing that 


Yin (x) = Yik- (x)| 


approaches zero quickly enough. But this absolute value is equal to 


| f 1 iaa Orisa 0ne-a (D) = fll Yaa lE) Yea EA 


< ie lfi(t, Yik-ı (t), oe Unki (t)) a filt, Y1p—2 (t), eee Yng- (t)) ldt. 


The last integral’s size can be controlled by applying the Lipschitz condi- 
tions and showing that it approaches zero. 
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14.3 The Laplacian 


14.3.1 Mean Value Principle 
In R”, the Laplacian of a function u(x) = u(z1,..., £n) iS 


ðu 3u 


Au = —— +... + =. 
= bea” Tae 


One can check that the PDE 
Au =0 


is homogeneous and linear and thus that the solutions form a vector space. 
These solutions are important enough to justify their own name. 


Definition 14.3.1 A function u(x) = u(a1,...,2%,) is harmonic if u(x) 
is a solution to the Laplacian: 


Au = 0. 


Much of the importance of the Laplacian is that its solutions, harmonic 
functions, satisfy the Mean Value Principle, which is our next topic. For 
any point a € R”, let 


Sa(r) = {x € R”: |x —a| =r}, 
be the sphere of radius r centered at a. 


Theorem 14.3.1 (Mean Value Principle) Jf u(x) = u(ai,...,%n) is 
harmonic, then at any point a € R”, 


EOF. 
ula) = —-—s UZ). 
(a) area of Salr) Jsa(r) (2) 
Thus u(a) is equal to the average value of u(x) on any sphere centered at 
a. For a proof of the case when n is two, see almost any text on complex 
analysis. For the general case, see G. Folland’s Introduction to Partial 
Differential Equations [39], section 2.A. 


Frequently, in practice, people want to find harmonic functions on re- . . 


gions subject to given boundary conditions. This is called: 


The Dirichlet Problem: Let R be a region in R” with boundary ØR. 
Suppose that g is a function defined on this boundary. The Dirichlet Prob- 
lem is to find a function f on R satisfying 


Af=0 
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on R and 


on OR. 


One way this type of PDE arises naturally in classical physics is as a 
potential. It is also the PDE used to study a steady-state solution of the 
heat equation. We will see in the next section that heat flow satisfies the 
PDE: 


3u Ys ih u s ðu 
ðr? ` rZ OF 
where u(z1,..., 2n, t) denotes the temperature at time t at place (£1,..., En) 


By a steady-state solution, we mean a solution that does not change over 
time, hence a solution with 
ðu 
ot 


Thus a steady state solution will satisfy 


= 0. 


hu- Zt, 2u 
“= O02" Oa2 


and hence is a harmonic function. 


14.3.2 Separation of Variables 


There are a number of ways of finding harmonic functions and of solving 
the Dirichlet Problem, at least when the involved regions are reasonable. 
Here we discuss the method of separation of variables, a method that can 
also frequently be used to solve the heat equation and the wave equation. 
By the way, this technique does not always work. 
We will look at a specific example and try to find the solution function 
u(x, y) to 
u Pu 
aa ta =Q, 
Ox? ôy? 


on the unit square, with boundary conditions 


_f h(x) ify=1 
ula) = g ife=0,2=lory=0 


where h(x) is some initially specified function defined on the top side of the 
square. 
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7 


u(x, 1) = h(x) 






u(1,y) = 0 





The key assumption will be that the solution will be of the form 


u(x, y) = f(z) gly), 
where 
f(0) = 0, g(0) =90, f(1) =0, f(z): g(1) = h(z). 


This is wild. Few two-variable functions can be written as the product 
of two functions, each a function of one-variable alone. The only possible 
justification is if we can actually find such a solution, which is precisely 
what we will do. (To finish the story, which we will not do, we would need 
to prove that this solution is unique.) If u(x, y) = f(z)-g(y) and if Au = 0, 
then we need 

d f 


qz) + f(z) ou 


go" 


Thus we would need 


A m 
f(z) gy) 


Each side depends on totally different variables, hence each must equal to 
a constant. Using the boundary conditions f(0) = f(1) = 0, one can show 
that this constant must be negative. We denote it by —c?. Thus we need 


af 


and 
d?g 
dy? a gly), 


both second order ODEs, which have solutions 


f(x) = à cos(ex) + Azo sin(cz) 


and 
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gy) = me + poe. 


We now apply the boundary conditions. We have that f(0) = 0, which 
implies that 
At = 0. 
Also g(0) = 0 forces 
Hı = pe 
and f(1) = 0 means that 
Ag sin(cx) = 0. 


This condition means that the constant c must be of the form 
c=kr, with k=0,1,2,.... 
Hence the solution must have the form 
u(z,y) = f(x) - gly) = Cy sin(kaa)(e* — e747), 


with C, some constant. 

But we also want u(x, 1) = h(x). Here we need to use that the Laplacian 
is linear and thus that solutions can be added. By adding our various 
solutions for particular c = kr, we set 


= $ Cp (eë™” — e™*"Y) sin(krz). 


All that is left is to find the constants Cp. Since we require u(z, 1) = h(a), 
we must have 


= ye C,(e*™ — e~*")sin(kra). 
But this is a series of sines. By the Fourier analysis developed in the last 
chapter, we know that 


Cy(e*™ — 8") = 2 a h(x) sin(kra)dz = “May — cos) es kn) 


Thus the solution is 


= 7 ry cos kr kry —kry 
u(z,y) = ker e*n) sin(krge)(e"™” — e ). 








k=1 


While not pleasant looking, it is an exact solution. 
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14.3.3 Applications to Complex Analysis 


We will now quickly look at an application of harmonic functions. The goal 
of Chapter Nine was the study of complex analytic functions f : U > C, 
where U is an open set in the complex numbers. One method of describing 
such f = u + iv was that the real and imaginary parts of f had to satisfy 
the Cauchy-Riemann equations: 


ðulz, y) _ Ov(a,y) 


Ox oy 


and 
ðuls y) __ Ov(a,y) 
ðy Ox ` 
Both real-valued functions u and v are harmonic. The harmonicity of u 


(and in a similar fashion that of v) can be seen, using the Cauchy-Riemann 
equations, via: 


u du 
Au = T T 
_ ô ð 
~ Ox dy Oy Ox 
= 0, 


One approach to complex analysis is to push hard on the harmonicity of 
the real-valued functions u and v. 


14.4 The Heat Equation 


We will first describe the partial differential equation that is called the Heat 
Equation and then give a physics-type heuristic argument as to why this 
particular PDE should model heat flow. In a region in R? with the usual 
coordinates x,y,z, let 


u(x, y, z,t) = temperature at time t at (x,y, z). 
Definition 14.4.1 The heat equation is: 


Pu Pu, Pu du 


m a Oz Or 


where c is a constant. 
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Frequently one starts with an initial specified temperature distribution, 
such as 


u(x,y, 2,0) = f(x,y, 2), 
with f(x,y,z) some known, given function. 


Surprisingly, the heat equation shows up throughout mathematics and 
the sciences, in many contexts for which no notion of heat or temperature is 
apparent. The common theme is that heat is a type of diffusion process and 
that the heat equation is the PDE that will capture any diffusion process. 
Also, there are a number of techniques for solving the heat equation. In fact, 
using Fourier Analysis, we solved it in the one-dimensional case in Chapter 
Thirteen. The method of separation of variables, used in last section to 
solve the Laplacian, can also be used. 

Now to see why the above PDE deserves the name ‘heat equation’. As 
seen in the last section, 


Pu Pu, Pu 


euS a IA 


is the Laplacian. In non-rectilinear coordinates, the Laplacian will have 
different looking forms, but the heat equation will always be: 


ðu 
Au=c—. 
ot 
For simplicity, we restrict ourselves to the one-dimensional case. Con- 
sider an infinitely long rod, which we denote by the z-axis. 


Ax 
ap  y 


X-axis 


Though the basic definitions of heat and temperature are and were fraught 
with difficulties, we will assume that there is a notion of temperature and 
that heat is measured via the change in temperature. Let u(x,t) denote the 
temperature at position x at time t. We now denote the change in a variable 
by Au, Az, At, etc. Note that here A is not denoting the Laplacian of 
these variables. 

There are three important constants associated to our rod, all coming 
from the real world: the density p, the thermal conductivity k and the 
specific heat o. The density arises in that the mass m of the rod over a 
distance Az will be the product p- Az. The specific heat is the number 
c that, if a length Ax of the rod has its temperature u raised to u + Au, 
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then its heat will change by a - (mass) - Au. Note that this last number 
is the same as 0 - p- Ax Au. Here we are using the notion that heat is a 
measure of the change in temperature. Finally, the thermal conductivity k 


is the constant that yields 
Au 


‘Age 


as the amount of heat that can flow through the rod at a fixed point x. Via 
physical experiments, these constants can be shown to exist. 

We want to see how much heat flows in and out of the interval [r, c+ Aq]. 
By calculating this heat flow by two different methods, and then letting ~ 
Az ~+ 0, the heat equation will appear. First, if the temperature changes 
by Au, the heat will change by 


a:p: Ar. ^u. 
Second, at the point z + Az, the amount of heat flowing out will be, 
over time At, 


Au 
k- Ar lepAwAt. 


-e AU] At=heat flow outxend 
AX y 


Ax — 
i eee 5, fe, +" Dan e 
x x+ Ax 
k Au At = heat flow out x+Ax end 
x x+Ax 


At the point x, the amount of heat flowing out will be, over time At, 
Au 
—k- —|,At. 
Aa! 
Then the heat change over the interval Az will also be 


Au Au 
(kT le+ae = kT ledAt. 


Thus 


Au Au 
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Then 


A A 
(S le+Ar T Reale) = op Au 


Az k At’ 
Letting Ax and At approach 0, we get by the definition of partial differen- 
tiation the heat equation 


AE ou 
ðr? k ôt 
In fact, we see that the constant c is 
— 2P 
=> 


Again, there are at least two other methods for solving the heat equation. 
We can, for example, use Fourier transforms, which is what we used to 
solve it in Chapter Thirteen. We can also use the method of separation of 
variables, discussed in the previous section. 


14.5 The Wave Equation 


14.5.1 Derivation 


As its name suggests, this partial differential equation was originally derived 
to describe the motion of waves. As with the heat equation, its basic 
form appears in many apparently non-wave-like areas. We will state the 
wave equation and then give a quick heuristic description of why the wave 
equation should describe waves. 

A transverse wave in the x — y plane travelling in the x-direction should 
look like: 


The solution function is denoted by y(x, t), which is just the y coordinate 
of the wave at place x at time t. The wave equation in two independent 
variables is 

Py Oy 


ant “ae > 
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where cis a positive number. Usually we start with some type of knowledge 
of the initial position of the wave. This will of course mean that we are 
given an initial function f(x) such that 


{(x) = initial position 





y(z,0) = f(z). 
In general, the wave equation in n variables £1,..., n with initial condition 
f(@1,---;2n) is 
Ory Ory 3y 


aa eta a 


with initial condition 


Y(Ei,---Zn,0) = f (T1; --,2n)- 
In nonrectilinear coordinates, the wave equation will be: 


2 
Ay{@1,...,;2n,t) —c- 2y =0. 

Now to see the heuristics behind why this partial differential equation is 
even called the wave equation. Of course we need to make some physical as- 
sumptions. Assume that the wave is a string moving in an ‘elastic’ medium, 
meaning that subject to any displacement, there is a restoring force, some- 
thing trying to move the string back to where it was. We further assume 
that the initial disturbance is small. We will use that 


Force = (mass) - (acceleration). 


We let our string have density p and assume that there is a tension T in 
the string (this tension will be what we call the restoring force) which will 
act tangentially on the string. Finally, we assume that the string can only 
move vertically. 

Consider the wave 
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As 


Let s denote the arc length of the curve. We want to calculate the restoring 
force acting on the segment As of the curve in two different ways and then 
let As — 0. Since the density is p, the mass of the segment As will be the 
product (p+ As). The acceleration is the second derivative. Since we are 
assuming that the curve can only move vertically (in the y-direction), the 
2 
acceleration will be oH. Thus the force will be 
3y 
-As)- —. 
(p: As): Sg 

By the assumption that the displacement is small, we can approximate the 
arc length As by the change in the x-direction alone. 


A 
ss As~Ax 
Ax 


Hence we assume that the restoring force is 


3y 
(pAs): 2" 


Now to calculate the restoring force in a completely different way. At 
each point in the picture 
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the tension T gives rise to an acceleration tangent to the curve. We want 
the y component. At the point z + Az, the restoring force will be 


Tsinbz. 
At the point z, the restoring force will be 
—T sino. 


Since both angles 6, and 6 are small, we can use the following approxima- 
tion 


sind, ~ tanĝı = 


y 
Jz” 


: ð 
sinb ~ tanðz = 55 lets 


i ~ ~ A Rane ð 
sin(@) ~ tan(é) A =e 





Then we can set the sa force to be 


Oy 
r% Platas = Jz) 


As we have now calculated the restoring force in two different ways, we can 
set the two formulas equal: 


2 
r let Aa — sel.) = pAr.: oe. 
or 
ou pag = oul pir 
Az T Ot 


Letting Az — 0, we get 
3y py 


ðL iT AR’ 
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the wave equation. 

Now to see what solutions look like. We assume that y(0) = 0 and 
y(L) = 0, for some constant L. Thus we restrict our attention to waves 
which have fixed endpoints. 

An exercise at the end of the chapter will ask you to solve the wave equa- 
tion using the method of separation of variables and via Fourier transforms. 
Your answer will in fact be: 


ox 
. [NNE nrt 
y(x, t) = 2 knsin (=) cos (=) 
where 


kn = a f(a)sin (=) dz. 


14.5.2 Change of Variables 


Sometimes a clever change of variables will reduce the original PDE to a 
more manageable one. We will see this in the following solution of the 
wave equation. Take an infinitely long piece of string. Suppose we pluck 
the string in the middle and then let go. 


a ee 


0 


After a short time, we should get: 


"e FPR 8 


0 


with seemingly two waves moving in opposite directions but at the same 
speed. With much thought and cleverness, one might eventually try to 
change coordinate systems in an attempt to capture these two waves. 
Thus suppose we want to solve 
Py 18y 


ðr eae ” 
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subject to the initial conditions 
y(z,0) = g(x) and $4(x, 0) = h(a) 


for given functions g(x) and h(x). Note that we have relabelled the constant 
in the wave equation to be +. This is done solely for notational convenience, 
as we will in a moment. 

Now to make the change of variables. Set 


u = x + ct and v = g — ct. 


Using the chain rule, this coordinate change transforms the original wave 
equation into: 
y 

ðuðv 
We can solve this PDE by two straightforward integrations. First integrate 
with respect to the variable u to get 

st = al), 
where a(v) is an unknown function of the variable v alone. This new func- 
tion a(v) is the ‘constant of integration’, constant with respect to the u 
variable. Now integrate this with respect to v to get 


y(u,v) = Atv) + Blu), 


where A(v) is the integral of a(v) and B(w) is the term representing the 
‘constant of integration’ with respect to v. Thus the solution y(u, v) is the 
sum of two, for now unknown, functions, each a function of one variable 
alone. Plugging back into our original coordinates means that the solution 
will have the form: 


y(u,v) = A(x — ct) + B(x + ct). 


We use our initial conditions to determine the functions A(x — ct) and 
B(x + ct). We have 


g(x) = y(z,0) = A(@) + B(x) 


and 


h(x) = ea, 0) = —cA' (x) + cB' (x). 


For this last equation, integrate with respect to the one variable x, to get 
that 


i h(s)ds + C = —cA(x) + cB(a). 
0 
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Since we are assuming that the functions g(x) and h(x) are known, we can 
now solve for A(x) and B(x), to get: 


A(z) = 390) -5 | Mod- E 


and i vE c 


Then the solution is: 


y(z,t) = A(z- ct)+ B(x + ct) 
g(z—ct)+g(zr+ct) 1 te 
= OO tt h(s)ds, 
2 2c x—et ( ) 


This is called the d’Alembert formula. Note that if the initial velocity 
h(x) = 0, then the solution is simply 


aay g(x — ct) — 


which is two waves travelling in opposite directions, each looking like the 
initial position. (Though this is a standard way to solve the wave equa- 
tion, I took the basic approach from Davis’ Fourier Series and Orthogonal 
Functions [24].) 

This method leaves the question of how to find a good change of coor- 
dinates unanswered. This is an art, not a science. 


14.6 The Failure of Solutions: Integrability 
Conditions 

There are no known general methods for determining when a system of 

partial differential equations has a solution. Frequently, though, there are 

necessary conditions (usually called ‘integrability conditions’) for there to 

be a solution. 


We will look at the easiest case. When will there be a two-variable 
function f(x,y), defined on the plane R?, satisfying: 


of = 91(z,y) 


and 


OF 
dy i g2(z, y), 
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where both gı and g> are differentiable functions? In this standard re- 
sult from multivariable calculus, there are clean necessary and sufficient 
conditions for the solution function f to exist: 


Theorem 14.6.1 There is a solution f to the above system of partial dif- 
ferential equations if and only if 


gr _ O92 


ðy Ox 


In this case, the integrability condition is Sa = 2o, As we will see, this is 
the easy part of the theorem; it is also the model for integrability conditions 
in general. 

Proof: First assume that we have our solution f satisfying gL = 91(z,y) 


and 3 = go(x,y). Then 


On ð ðf 0 Of _ 99 

ðy yðr Oxdy Ox" 
Thus the integrability condition is just a consequence that the order for 
taking partial derivatives does not matter. 

The other direction takes more work. As a word of warning, Green’s 
Theorem will be critical. We must find a function f(x,y) satisfying the 
given system of PDEs. Given any point (æ, y) in the plane, let y be any 
smooth path from the origin (0,0) to (x,y). Define 


f(t,y) = f aava + glz, y)dy. 


We first show that the function f(x,y) is well-defined, meaning that its 
value is independent of which path y is chosen. This will then allow us to 
show that gt = g(x,y) and oe = go(x,y). Let rT be another smooth path 
from (0,0) to (x,y). 


T (xy) 
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We want to show that 
f gı(z,y)dz + g2(x, y)dy = f g(x, y)dz + go(zx, y)dy. 
Y T 


We can consider y -— 7T as a closed loop at the origin, enclosing a region R. 
(Note: it might be the case that y — r encloses several regions, but then 
just apply the following to each of these regions.) By Green’s Theorem we 
have 


II 


| gıdz + g2dy — 1 gidz + gody | gıdz + gady 
Yy T YT 
Ogo. On 
= —— — —)dad 
I Caz Dy )dady 

= 0 

by the assumption that oa. a 892 | Thus the function f(z, y) is well-defined. 
Now to show that this function f satisfies ot = g(x,y) and SE = 

g2(x, y). We will just show the first, as the second is similar. The key is 
that we will reduce the problem to the Fundamental Theorem of Calculus. 


Fix a point (£o, yo). Consider any path y from (0,0) to (%o,yo) and the 
extension y’ = y +7, where 7 is the horizontal line from (zo, yo) to (z, yo). 


{xo.y0) 





Then 
Ox 2-429 T — Xo 
T 
= lim das gı (t, yo)dt 
L+Xo T — Xo 


since there is no variation in the y-direction, forcing the gə part of the 
path integral to drop out. This last limit, by the Fundamental Theorem of 
Calculus, is equal to gi, as desired. O 


14.7 Lewy’s Example 


Once you place any natural integrability conditions on a system of partial 
differential equations, you can then ask if there will always be a solution. 
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In practice, often such general statements about the existence of solutions 
can be made. For example, in the middle of the twentieth century it was 


shown that given any complex numbers a1, ..., an and any smooth function 
g(Z1,..-,%n), there always exists a smooth solution f(z1,...,2n) satisfying 
Of Of 
az +... tany =g. 
Oxy naen I 


Based in part on these types of results, it was the belief that all reasonable 
PDEs would have solutions. Then, in 1957, Hans Lewy showed the amazing 
result that the linear PDE 
°t + ist — (x + iy) SF = = 9(2,y; 2) 

will have a solution f only if g is real-analytic. Note that while this PDE 
does not have constants as coefficients, the coefficients are about as rea- 
sonable as you could want. Lewy’s proof, while not hard (see Folland’s 
book on PDEs [39]), did not give any real indication as to why there is no 
solution. In the early 1970s, Nirenberg showed that the Lewy PDE did not 
have a solution due to that there existed a three-dimensional CR structure 
(a certain type of manifold) that could not be embedded into a complex 
space, thus linking a geometric condition to the question of existence of 
this PDE. This is a common tack, namely to concentrate on PDEs whose 
solutions have some type of geometric meaning. Then, in trying to find the 
solution, use the geometry as a guide. 


14.8 Books 


Since beginning differential equations is a standard sophomore level course, 
there are many beginning text books. Boyce and Diprima’s book [12] has 
long been a standard. Simmon’s book [99] is also good. Another approach 
to learning basic ODEs is to volunteer to TA or teach such a class (though 
I would recommend that you teach linear algebra and vector calculus first). 
Moving into the realm of PDEs the level of text becomes much harder and 
more abstract. I have learned a lot from Folland’s book [39]. Fritz John’s 
book [69] has long been a standard. I have heard that Evans’ recent book 
[33] is also excellent. 


14.9 Exercises 


1. The most basic differential equation is probably 


ay 
ae 
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subject to the boundary condition y(0) = 1. The solution is of course the 
exponential function y(x) = e”. Use Picard iteration to show that this is 
indeed the solution to gy = y. (Of course you get an answer as a power 
series and then need to recognize that the power series is e”. The author 
realizes that if you know the power series for the exponential function you 
also know that it is its own derivative. The goal of this problem is see 
explicitly how Picard iteration works on the simplest possible differential 
equation.) 

2. Let f(x) be a one variable function, with domain the interval [0,1], 
whose first derivative is continuous. Show that f is Lipschitz. 

3. Show that f(x) = e?” is not Lipschitz on the real numbers. 

4. Solve the wave equation 


2 2 
Oy _ OY _ 9 


dx? OP 
subject to the boundary conditions y(0,t) = 0 and y(L,t) = 0 and the 
initial condition y(z,0) = f(x) for some function f(x). 

a. Use the method of separation of variables as described in the section 
on the Laplacian. 
b. Now find the solutions using Fourier transforms. 


Chapter 15 


Combinatorics and 
Probability ‘Theory 


Basic Goals: Cleverly Counting Large Finite Sets 
Central Limit Theorem 


Beginning probability theory is basically the study of how to count large 
finite sets, or in other words, an application of combinatorics. Thus the 
first section of this chapter deals with basic combinatorics. The next three 
sections deal with the basics of probability theory. Unfortunately, counting 
will only take us so far in probability. If we want to see what happens 
as we, for example, play a game over and over again, methods of calculus 
become important. We concentrate on the Central Limit Theorem, which is 
where the famed Gauss-Bell curve appears. The proof of the Central Limit 
Theorem is full of clever estimates and algebraic tricks. We include this 
proof not only due to the importance of the Central Limit Theorem but 
also to show people that these types of estimates and tricks are sometimes 
needed in mathematics. 


15.1 Counting 


There are many ways to count. The most naive method, the one we learn 
as children, is simply to explicitly count the elements in a set, and this 
method is indeed the best one for small sets. Unfortunately, many sets are 
just too large for anyone to merely count the elements. Certainly in large 
part the fascination in card games such as poker and bridge is that while 
there are only a finite number of possible hands, the actual number is far 
too large for anyone to deal with directly, forcing the players to develop 
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strategies and various heuristical devices. Combinatorics is the study of 
how to cleverly count. Be warned that the subject can quickly get quite 
difficult and is becoming increasingly important in mathematics. 

We will look at the simplest of combinatorial formulas, ones that have 
been known for centuries. Start with n balls. Label each ball with a 
number 1,2,...,7 and then put the balls into an urn. Pull one out, record 
its number and then put the ball back in. Again, pull out a ball and record 
its number and put it back into the urn. Keep this up until & balls have 
been pulled out and put back into the urn. We want to know how many 
different k-tuples of numbers are possible. 

To pull out two balls from a three-ball urn (here n = 3 and k = 2), we 
can just list the possibilities: 


(1,1), (1,2), (1,8), (2, 1), (2, 2), (2,3), (3, 1), (3, 2), (8,3). 


But if we pull out seventy-six balls from a ninety-nine ball urn (here n = 99 
and k = 76), it would be ridiculous to make this list. 

Nevertheless, we can find the correct number. There are n possibilities 
for the first number, n possibilities for the second, n for the third, etc. Thus 
all told there must be n* possible ways to choose k-tuples of n numbers. 
This is a formula that works no matter how many balls we have or how 
many times we choose a ball. 

For the next counting problem, return to the urn. Pull out a ball, record 
its number and keep it out. Now pull out another ball, record its number 
and keep it out. Continue pulling out balls and not replacing them. Now 
we want to find out how many k-tuples of n numbers there are without 
replacement. There are n possibilities for the first number, only (n — 1) 
possibilities for the second, (n — 2) for the third, etc. Thus the number of 
ways of choosing from n balls k times without replacement is: 


n(n —1)(n—2)---(n-—k+1). 


For our next counting problem, we want to find out how many ways 
there are for pulling out k balls from an urn with n balls, but now not only 
not replacing the balls but also not caring about the order of the balls. 
Thus pulling out the balls (1,2,3) will be viewed as equivalent to pulling 
out the balls (2,1,3). Suppose we have already pulled out k of the balls. 
We want to see how many ways there are of mixing up these k balls. But 
this should be the same as how many ways are there of choosing from k 
balls k times, which is 


k(k —1)(k—2)-+--2-1= Kl. 


Since n(n—1)(n—2)---(n—k+1) is the number of ways of choosing from n 
balls & times with order mattering and with each ordering capable of being 
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mixed up k! ways, we have 


n{in—1)...n—k+1) _ n! 
k! ~ kin — kV? 


which is the number of ways of choosing k balls from n balls without re- 
placement and with order not mattering. This number comes up so often 


it has its own symbol 
n\ _ n! 
k) kMn—k)V 


pronounced ‘n choose k’. It is frequently called the binomial coefficient, 
due to its appearance in the Binomial Theorem: 


n 
a+b)” = ou. 
e+" =Y 
The idea is that (a+ b)” = (a+ b)(a +b)... (a+b). To calculate how many 
different terms of the form a*b"—* we can get, we note that this is the same 
as counting how many ways we can choose k things from n things without 
replacement and with ordering not mattering. 


15.2 Basic Probability Theory 


We want to set up the basic definitions of elementary probability theory. 
These definitions are required to yield the results we all know, such as that 
there is a fifty-fifty chance of flipping a coin and getting heads, or that there 
is a one in four chance of drawing a heart from a standard deck of 52 cards. 
Of course, as always, the reason for worrying about the basic definitions 
is not just to understand the obvious odds of getting heads but that the 
correct basic definition will allow us to compute the probabilities of events 
that are quite complicated. 

We start with the notion of a sample space w, which technically is just 
another name for a set. Intuitively, a sample space w is the set whose 
elements are what can happen, or more precisely, the possible outcomes of 
an event. For example, if we flip a coin twice, w will be a set with the four 
elements - 


{ (heads, heads), (heads, tails), (tails, heads), (tails, tatls)}. 


Definition 15.2.1 Let w be a sample space and A a subset of w. Then the 
probability of A, denoted by P(A), is the number of elements in A divided 
by the number of elements in the sample space w. Thus 


Pla) = 1, 
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where |A| denotes the number of elements in the set A. 
For example, if 
w = {(heads, heads), (heads, tails), (tails, heads), (tails, tails)}, 


and if A = {(heads, heads)}, then the probability of flipping a coin twice 
and getting two heads will be 
[A| 


1 
PA) = =p 


which agrees with common sense. 


In this framework, many of the basic rules of probability reduce to rules 
of set theory. For example, via sets, we see that 


P(AUB) = P(A) + P(B) — P(ANB). 


Frequently, a subset A of a sample space w is called an event. 

There are times when it is too much trouble to actually translate a 
real-world probability problem into a question of size of sets. For example, 
suppose we are flipping an unfair coin, where there is a 3/4 chance of getting 
a head and a 1/4 chance of getting tails. We could model this by taking 
our sample set to be 


w = {heads;, heads, headss, tails}, 


where we are using subscripts to keep track of the different ways of getting 
heads, but this feels unnatural. A more natural sample space would be 


w= {heads, tails}, 


and to somehow account for the fact that it is far more likely to get heads 
than tails. This leads to another definition of a probability space: 


Definition 15.2.2 A probability space is a set w, called the sample space, 
and a function 


P:w —> [0,1] 
such that 

X Pla) =1. 

acw 


We say that the probability of getting an ‘a’ is the value of P(a). 
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If on a sample space w it is equally likely to get any single element of w, 
i.e., for all a € w we have 


O 


then our ‘size of set’ definition for probability will agree with this second 
definition. For the model of flipping an unfair coin, this definition will give 
us that the sample set is: 

w = {heads, tails}, 


but that P(heads) = 3/4 and P(tails) = 1/4. 
We now turn to the notion of a random variable. 


Definition 15.2.3 A random variable X on a sample space w is a real- 
valued function on w: 


X:w >R. 


For example, we now create a simplistic gambling game which requires two 
flips of a coin. Once again let the sample space be 


w = {(heads, heads), (heads, tails), (tails, heads), (tails, tails) }. 
Suppose that, if the first toss of a coin is heads, you win ten dollars. If 
it is tails, you lose five dollars. On the second toss, heads will pay fifteen 


dollars and tails will cost you twelve dollars. To capture these stakes (for 
an admittedly boring game), we define the random variable 


X:woR 


by 
X (heads, heads) = 10+ 15 = 25 


X(heads, tails) = 10 —- 12 = —2 


X (tails, heads) = —5 + 15 = 10 


X(tails, tails) = —5 — 12 = —17. 
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15.3 Independence 


Toss a pair of dice, one blue and one red. The number on the blue die 
should have nothing to do with the the number on the red die. The events 
are in some sense independent, or disjoint. We want to take this intuition 
of independence and give it a sharp definition. 

Before giving a definition for independence, we need to talk about con- 
ditional probability. Start with a sample space w. We want to understand 
the probability for an event A to occur, given that we already know some 
other event B has occurred. For example, roll a single die. Let w be the 
six possible outcomes on this die. Let A be the event that a 4 shows up. 
Certainly we have 

[Al _ 1 
P(A) = m =e 
But suppose someone tells us, before we look at the rolled die, that they 
know for sure that on the die there is an even number. Then the probability 
that a 4 will occur should be quite different. The set B = {2,4,6} is the 
event that an even number occurs. Then the probability that a 4 shows up 
should now be 1/3, as there are only three elements in B. Note that 


3 ĮB} B P(B) 


w 


1_|AnBI_ MS pcan) 


This motivates the definition: 


Definition 15.3.1 The conditional probability that A occurs given that B 


has occurred is: 
P(ANB) 


P(AIB) = So 


What should it mean for an event A to be independent from an event 
B? At the least, it should mean that knowing about the likelihood of event 
B occurring should have no bearing on the likelihood that A occurs, i.e., 
knowing about B should not effect A. Thus if A and B are independent, 
we should have 

P(A|B) = P(A). 


Using that P(A|B) = PGR, this means that a reasonable definition for 
independence is: 


Definition 15.3.2 Two events A and B are independent if 


P(ANB) = P(A) - P(B). 
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15.4 Expected Values and Variance 


In a game, how much should you be expected to win in the long run? This 
quantity is the expected value. Further, how likely is it that you might lose 
big time, even if the expected value tells you that you will usually come 
out ahead? This type of information is contained in the variance and in its 
square root, the standard deviation. We start with some definitions. 


Definition 15.4.1 The expected value of a random variable X on a sample 
space w is: 
E(X) = X` X(a) - P(a). 
acw 

For example, recall the simplistic game defined at the end of section two, 
where we flip a coin twice and our random variable represents our win- 
nings: X(heads, heads) = 10+ 15 = 25, X(heads, tails) = 10-12 = 
—2, X(tails, heads) = —5 + 15 = 10, and X(tails, tails) = —5 — 12 = —17. 
The expected value is simply: 


ONETO RTO RETO 


= 4, 


E(X) 


Intuitively, this means that on average you will win four dollars each time 
you play the game. Of course, luck might be against you and you could 
` lose quite a bit. 

The expected value can be viewed as a function from the set of all 
random variables to the real numbers. As a function, the expected value is 
linear. 


Theorem 15.4.1 Ona probability space, the expected value is linear, mean- 
ing that for all random variables X and Y and all real numbers À and p, 
we have 

E(AX + pY) = AE(X) + pE(Y). 


Proof: This is a straightforward calculation from the definition of expected 
value. We have 


E(\X + LY) 


STA + #Y)(a) - P(a) 


acw 


= X (AX(a) + uY (a)) - P(a) 


acw 


= S>AK(a)- P(a) + $` KY (a): P(a) 


aEw acw 
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A> X(a)- Pa) +H >> Y(a)- PCa) 


acw acw 


AE(X) + 4E(Y). 0 


The expected value will only tell a part of the story, though. Consider 
two classes, each with ten students. On a test, in one of the classes five 
people got 100s and five got 50s, while in the other everyone got a 75. In 
both classes the average was a 75 but the performances were quite different. 
Expected value is like the average, but it does not tell us how far from 
the average you are likely to be. For example, in the first class you are 
guaranteed to be 25 points from the average while in the second class you 
are guaranteed to be exactly at the average. There is a measure of how 
likely it is that you are far from the expected value: 


Definition 15.4.2 The variance of a random variable X on a sample space 
w is 
V(X) = E[X - E(X)P. 
The idea is we set up a new random variable, 
[X - E(x). 


Note that the expected value E(X) is just a number. The farther X is from 
its expected value E(X), the larger is [X — E(X)]?. Thus it is a measure of 
how far we can be expected to be from the average. We square X — E(X) 
in order to make everything non-negative. 

We can think of the variance V as a map from random variables to the 
real numbers. While not quite linear, it is close, as we will now see. First, 
though, we want to show that the formula for variance can be rewritten. 


Lemma 15.4.1 For a random variable X on a probability space, we have 
V(X) = E(X?) - [E(X)}? 
Proof: This is a direct calculation. We are interested in the new random 
variable 
[X — E(X). 
Now 
[X — E(X)]}? = X? — 2X F(X) + [E(X)]’. 
Since E(X) is just a number and since the expected value, as a map from 
random variables to the reals, is linear, we have 
V(X) = EX- EX)’ 

= E[X® —2XE(X) + [E(X)]’] 

= E(X’) — 2E(X)E(X) + [BX)P 

= E(X*)-[E(X)P, 
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as desired. O 


This will allow us to show that the variance is almost linear. 


Theorem 15.4.2 Let X and Y be any two random variables that are in- 
dependent on a probability space and let À be any real number. Then 


V(AX) = X? V(X) 
and 


V(X +Y)=V(X)+V(Y). 


It is the A? term that prevents the variance from being linear. 


Proof: Since the expected value is a linear function, we know that E(AX) = 
AE(X). Then 


V(AX) E[(AX)?] — [E(X)? 
X E(X?) — [AE(X)]? 
\?[E(X?) — [E(X)}] 
= )?V(X). 


For the second formula, we will need to use that the independence of X 
and Y means that 


E(XY) = E(X)E(Y). 
By the above lemma’s description of variance, we have 


V(X + Y) EX + Y)?] - [E(X + Y)? 
E[X? + 2XY + Y?] —[E(X) + E(Y)}? 
= E[X?]+2E[XY]+ E[Y’] 
—[E(X))? — 2E(X)E(Y) - [E(Y)? 
= (E[X?] - [E(X)}) + QE[XY] 
—2E(X)E(Y)) + (E[Y?] —[E(Y)]’) 
= V(X)+V(Y), 


as desired. O 
A number related to the variance is its square root, the standard devi- 
ation: 


standard deviation(X) = o (X) = VV (X). 
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15.5 Central Limit Theorem 


In the last section we defined the basic notions of probability in terms of 
counting. Unfortunately, combinatorics can only take us so far. Think 
about flipping a coin. After many flips, we expect that the total number 
of heads should be quite close to one half of the total number of flips. In 
trying to capture this notion of flipping a coin over and over again, we need 
to introduce the following: 


Definition 15.5.1 Repeated independent trials are called Bernoulli trials if 
there are only two possible outcomes for each trial and if their probabilities 
remain the same throughout the trials. 


Let A be one of the outcomes and suppose the probability of A is P(A) = p. 
Then the probability of A not occurring is 1 — p, which we will denote by 
q. Let the sample space be 


w = {A,not A}. 
We have 


P(A) =p, P(not A) = q. 


We now want to see what happens when we take many repeated trials. 
The following theorem is key: 


Theorem 15.5.1 (Central Limit Theorem) Consider a sample space 
w = {A,not A} with P(A) = p and P(not A) = 1— p = q. Given n 
independent random variables X1,..., Xn, each taking 


X,(A) = 1, X;(not A) = 0, 


set 
n 
Sn = >) Ki 
i=1 
and 
St = Sn — E(Sn) 
V (Sn) 


Then for any real numbers a and b, 


E oca 
lim Pfa < S> < b} = —— e Z dz. 
fim, Pia S Sn Sb} a 
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What this is saying is that if we perform a huge number of repeated 
Bernoulli trials, then the values of 8, will be distributed as: 


Sn 


But we have even more. Namely, by normalizing S, to the new random 
variable S% (which, as we will see in a moment, has mean zero and variance 
one), we always get the same distribution, no matter what the real world 
situation we start with is, just as long as the real world problem can be 
modelled as a Bernoulli trial. By the way, the distribution for any Bernoulli 
trial is simply the graph of the function liMn—=co Sn. We call S} the normal 
distribution. Its graph is the Gauss-Bell curve. 





Before sketching a proof of the Central Limit Theorem (whose general 
outline is from [18]), let us look at the random variables Sn and S}. 


Lemma 15.5.1 The expected value of S, is np and its variance is npq. 
The expected value of S> is 0 and its variance is 1. 


Proof of Lemma: We know that for all k, 
E(Xx) = Xx(A)P(A) + X, (not A)P(not A) =1-p+0-q=p. 
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Then by the linearity of the expected value function 


E(S,) = E(Xi+...+Xn) 
= E(Xi)+...+E(Xn) 
= np. 


As for the variance, we know that for any k, 


V(X) E(X}) — [E (X4)? 

X?(A)P(A) + X? (not A)P(not A) — p? 
1?-p+0?-q—p? 

p-p 

p(l -= p) 

= pq. 


Then we have 


V (Sa) V(X +... +Xn) 
VE VO) 


= npq. 


Now 


x Sn — E(Sn) 
ES. = E| = 
L ( V(Sn) 
1 


= — E(Sn — E(Sn 
yaa” (Sn)) 
= —~(H(S,) — E(E(Sn))), 


VV (Sn) 


which, since E(S,,) is just a number, is zero. 

Now for the variance. First, note that for any random variable that 
happens to be a constant function, the variance must be zero. In particular, 
since the expected value of a random variable is a number, we must have 
that the variance of an expected value is zero: 


V(E(X)) =0. 
Using this, we have that 


š Sn — E(Sn) 
Vise), = 7 | eee 
(Sn) ( VV (Sn) ) 
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as desired. O 
Before discussing the proof of the Central Limit Theorem, let us look 
at the formula 





<Sr< 
im, P(a < S} <b) S fet 
It happens to be the case that for any P choice, of a and b, it is im- 
possible to explicitly calculate the integral —— He fi ens = dex; instead people 
must numerically approximate the answers, hich of course can easily be 
done with e software packages like Maple or Mathematica. Surpris- 
ingly enough, Fa Jo es de can be shown to be exactly one. We first 
show why this Sue be the case if the Central Limit Theorem is true and 
then we will explicitly prove that this integral is one. 

For any sequence of events and for any n, Sp must be some number. 
Thus for all n, 

P(—-œ < S% <œ) =1, 
and thus its limit as n goes to infinity must be one, meaning that our 
38 

integral is one. Thus if Tz [re dx is not one, the Central Limit 
Theorem would not be true. Thus we need to prove that this integral is 
one. In fact, the proof that this integral is one is interesting in its own 
right. 


Theorem 15.5.2 


1 =z? 
V 2r fe 7 


Proof: Surprisingly, we look at the square of the integral: 











(ef Pare) P F 
— ae — e 2 
V 27 Joo V 20 V 27 Joo 
Since the symbol a just denotes what variable we are integrating over, we 
can change the z in the second integral to a y without changing the equality: 


F de)( EJ e 








af Pe = Ge fe 
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Since the z and the y have nothing to do with each other, we can combine 
these two single integrals into one double integral: 


1 co a 
= | / e 2 e 2 dady 
2m J—co J—oo 
1 oo CO a2 2) 
= x | / e cH dzdy, 
27 —oo J —0o 


which is now a double integral over the real plane. The next trick is to 
switch over to polar coordinates, to reduce our integrals to doable ones. 
Recall that we have dedy = rdrd@ and z? + y? = r? 


a 
— e 2 dz 
27 Joo 








(1,8) = (xy) 


in polar coordinates. Then we have 


2r co 


1 [Fay : = rdrdd 
a e 2? dz = = —e 2 rdr 
(Fax x 2T Jo Jo 

i Pe 
= on e 2 lo” dð 
T Jo 
1 2r 
= — dé 
2m Jo 


= 1, 


as desired. O 
Proof of Central Limit Theorem: (Again, we got this argument from 
[18].) At a critical stage of this proof, there will be a summation of terms 


of the form 
n k „n-k 
(z : 


2 
1 Zk 
2 


which we will replace by 


— e 3 

y 2Tnpq 
where the x, will be defined in a moment. We will see that the justification 
for this replacement is a corollary of Stirling’s formula for n!, next section’s 
topic. 
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We are interested in P(a < S* <b). But, at least initially, the random 
variable S, is a bit easier to work with. We want to link S, with 6%. 
Suppose that we know that S, = k, which means that after n trials, there 
have been exactly k occurrences of A (and thus n—k occurrences of not A). 
Let x, denote the corresponding value for S}. Then 


o EZES) 

k = =. 
VV (Sn) 

Since E(Sn) = np and V (Sn) = npq, we have 


k — np 


npa 





and thus 
k = np + ynpqzp. 
Then 


First we need to show that 
UN piga 
P(Sn = k) = (e E 


Now Sn = k means that after n trials there are exactly k Æ’s. Since 
P(A) = p and P(not A) = q, we have that the probability of any particular 
pattern of k A’s is p*q”—* (for example, if the first & trials yield A’s and the 
last n — k trials yield not A’s). But among n trials, there are (m) different 
ways for there to be k A’s. Thus P(Sn = k) = (7)p*q”-*. 

Then we have 


P(a < S% <b) = ` Oua 


{a<er <b} 


z2 
We now replace (7) p*q”~* with Ine (which, again, will be justified 
in the next section), giving us 


1 2k 
P(a< S <b) = —; e? 
panay M 
toi 


y ey 
fo<oucty ¥ OM VPS 
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Note that 
Š k+1-np _k-np_ 1 
Eti RT OO Japa Jopi 
Thus 


Passis D elen e) 


{a<a,<b} 


As we let n approach infinity, the interval [a,b] is split into a finer and finer 
partition by the æ. The above sum is a Riemann sum and can thus be 
replaced, as n approaches infinity, by our desired integral: 


, 2 1 f =22 
Jm Pla < Si <d) = z | e 2 dz. O 


15.6 Stirling’s Approximation for n! 


Stirling’s formula tells us that for large n we can replace n! by V2amnn"e™. 
We need this approximation to complete the proof of the Central Limit 
Theorem. (We are still following [18].) 

First, given two functions f(n) and g(n), we say that 


f(n) ~ gln) 
if there exists a nonzero constant c such that 


Io 


n=% g(n) 


Thus the functions f(n) and g(n) grow at the same rate as n goes to infinity. 
For example 
n? ~ 5n? — 2n + 3. 


Theorem 15.6.1 (Stirling’s Formula) 


n! ~ V2rnn”e” 


Proof: This will take some work and some algebraic manipulations. 
First note that 


2rnn” = VInn” tie”, 


We will show here that 


n! 
m ~rr =K, 
noo neten 


15.6. STIRLING’S APPROXIMATION FOR N! 301 


for some constant k. To show that k = /2z, we use the following convo- 
luted argument. Assume that we have already shown that n! ~ kn™+2e-", 
Use this approximation in our replacement of (?)p*q”~* in the following 
corollary and, more importantly, in the proof in the last section of the Cen- 
tral Limit Theorem. If we follow the steps in that proof, we will end up 
with 





dim, Pla < Sh <b) =if e# 


Since for each n, we must have S* equal to some number, we know that 
P(—oo < Sf < œ) = Land thus limn_,o. P(—oo < S* < co) = 1. Then we 


must have 
1 ff? -2 
ae e 2 dx= 1. 


But in the last section we calculated that dee a ent dx = /2n. From this 
calculation, we see that k must be /27. 

Now for the meat of the argument, showing that such a k exists. This 
will take some work and involve various computational tricks. Our goal is 
to show that there is a nonzero constant k such that 

lim Ai =k. 


noo nnrts a e7n 


Since we have no clue for now as to what k is, save that it is positive, call it 
e°, with c some other constant (we will be taking logarithms in a moment, 
so using e° will make the notation a bit easier). Now, 


n! 
lim ———— =e 
Nn 0o noti e7” 


c 


exactly when 


lim 1 n! 

nao 8 | antera j TÀ 
Using that logarithms change multiplications and divisions into sums and 
differences, this is the same as 


1 
i N — = 
dim (log(n!) (n+ 5) log(n) +n)=c. 
For notational convenience, set 


dy, = log(n!) — (n + 5) log(n) +n. 
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We want to show that dn converges to some number c as n goes to oo. Here 
we use a trick. Consider the sequence 


So (di — dita) = (dı — dz) + (d2 — d3) +.. . (dn — dn+1) = dı — dn41- 


i=l 


We will show that the infinite series )>°° , (di—di41) converges, which means 
that the partial sums 57;"_, (d; — di41) = di — dn41 converge. But this will 
mean that dj, will converge, which is our goal. 

We will show that >?°, (d; — d;41) converges by the comparison test. 
Specifically, we will show that 


2n+1 1 


[dn — dn+ı| < ni ane 


Since both 73°, 444" and S72, z4 converge, our series will converge. 
This will be a long calculation. We will need to use that, for any x with 
It] < 3, 


z2 
log(1 + £) = £z — ga O(a) 
where 6(x) is a function such that for all |z| < 3, 
[A(x)| < |z/°. 


This follows from the Taylor series expansion of log(1+2). The requirement 
that |x| < $ is not critical; all we must do is make sure that our |z| are 
sufficiently less than one. 

Now, 


jdn i dn+t | 


flog(n!) — (n+ 5 log(n) + n] — 
jlog((n + 1)!) — (n+ 1+ >) log(n+1)+n+1] 
= [log(n) +... + log(1) — (n+ 5) log(n) + n] 
—flog(n + 1) +--+ + log(1) 
—(n+1+ 5) loan + 1)+n +1] 
= -(n+ 5 log(n) + (n+ 5 log(n+1)-1 
= (n+ 5) log (=) —1 


1 1 
= (n+ 5) +z)-1 
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= e+p- =z +O) -1 
= (n+5)(-)- 75 

ie) 1 
= n3 — An?’ 


which gives us our result. O 
While Stirling’s formula is important in its own right, we needed to use 
its following corollary in the proof of the Central Limit Theorem: 


Corollary 15.6.1.1 Let A be a constant. Then for x, < A, we have 


a2 
mh oF q7 k 1 e~ E ; 
k ~ Vannpg 
Here the notation is the same as that used in the last section. In particular, 
if Sn = k, we set S% = xp. Then we have 


k = np + J/npqete, 


and subtracting both sides of this equation from n, we have 


n— k =n- np- npt = nq — snpqzk. 
If, as in the corollary, z, < A, then we must have 
k~np 
and 
n—k ~ nq. 


In the following proof, at a critical stage we will be replacing k by np and 
n — k by nq. 
Proof of Corollary: By definition 


k n-k _ n! k n-k 
(r)a ~ Elin — kyl? 4 
(3)? v ann pë n—k 
(i Vank EH /aaln— By 


using Stirling’s formula, which in turn yields 
a) Ga 
Qrk(n—k) \k n—k 


mnm 2) (24) 
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using here that k ~ np and n —k ~ ng. This in turn equals 


ge: 
2rnpq ( k n—k l 
If we can show that 


n-k 2 
np\* f nq Ey 
( a (G a z) e 
we will be done. Using that we can replace log(1 + x) by x — E for small 
x, we will show that 


(W G) -4 


klog (2) + (n — k) log (55) 

















o 
ge 
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So ee 
li 


_ _ Sipaen 
= klog (1 ia 
+(n — k) log (1 + ei | ; 


using that the equality k = np + ,/npqz;, implies 
np  k—Jf/npqr, _ /N PGT 
pe ge ee 


and a similar argument for the (n — k). But then we can replace the log 
terms in the above to get 


k (- eee 1) + (n =k) ( o set ) 


kok? n-k  22(n—k)? 


_npar} — npari 
2k 2(n za k) 


npqz? /1 
z -E (g+ 
k 
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since earlier we showed that np~ k andng~n-—k. O 

The proof of Stirling’s formula and of its corollary were full of clever 
manipulations. Part of the reason that these steps are shown here is to 
let people see that despite the abstract machinery of modern mathematics, 
there is still a need for cleverness at computations. 


15.7 Books 


From informed sources, Brualdi’s book [14] is a good introduction to com- 
binatorics. An excellent, but hard, text is by van Lint and Wilson [115]. 
Cameron’s text [16] is also good. Polya, Tarjan and Woods’ book [93] is 
fascinating. To get a feel of how current combinatorics is used, Graham, 
Knuth and Patashnik’s [47] book is great. Stanley’s text [105] is a standard 
text for beginning graduate students in combinatorics. 

For probability theory, it is hard to imagine a better text than Feller 
[34]. This book is full of intuitions and wonderful, nontrivial examples. 
Grimmett and Stirzaker [50] is also a good place to begin. Another good 
source is Chung’s book [18], which is where, as mentioned, I got the flow of 
the above argument for the Central Limit Theorem. More advanced work 
in probability theory is measure theoretic. 


15.8 Exercises 


1. The goal of this exercise is to see how to apply the definitions for prob- 
ability to playing cards. 

a. Given a standard deck of fifty-two cards, how many five card hands 
are possible (here order does not matter). 

b. How many of these five card hands contain a pair? (This means 
that not only must there be a pair in the hand, but there cannot be a 
three-of-a-kind, two pair, etc.) 

c. What is the probability of being dealt a hand with a pair? 

2. The goal of this exercise is to see how the formulas for (2) are linked to 
Pascal’s triangle. 
a. Prove by induction that 


n\ _ f{n-1 4: n-l 
k} k k-1)° 
b. Prove this formula by counting how to choose k objects from n 
objects (order not mattering) in two different ways. 


c. Prove that the binomial coefficients (p) can be determined from 
Pascal’s triangle, whose first five rows are: 
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d. Give a combinatorial proof of the identity 
2 fn 
— gn 
2 
k=0 


4. Find a formula for determining how many monomials of degree k can 
be made out of n variables. (Thus for the two variables x,y, the number of 
monomials of degree two is three, since we can simply count the list 


(a*, ey, y”).) 


5. The pigeonhole principle states: 
If (n+1) objects are placed into n different boxes, at least one box must have 
at least two objects in it. 

Let a1,...,@n41 be integers. Show that there is at least one pair of 
these integers such that a; — a; is divisible by the integer n. 
6. The goal of this problem is to prove the Inclusion-Exclusion Principle, 
the statement of which is part c. 

a. Let A and B be any two sets. Show that 


|AU B| = |A| + |B] -|ANBI. 
b. Let Ai, Ae and Ag be any three sets. Show that 
|A,UA2UA3| = |A1|+]A2|+]A3|—|A1NA2|—-|A1NA3|—|42NA3|+|AiNAaNAsgl. 
c. Let A1,...,An be any n sets. Show that 
|A1U...U An] = SJAi] -— DARN Aj] +... 4+ (1) Ar... An. 


ee ~ (wn)-¥292”, 


n 


7. Show that 


Chapter 16 


Algorithms 


Basic Object: Graphs and Trees 
Basic Goal: Computing the Efficiency of Algorithms 


The end of the 1800s and the beginning of the 1900s saw intense debate 
about the meaning of existence for mathematical objects. To some, a math- 
ematical object could only have meaning if there was a method to compute 
it. For others, any definition that did not lead to a contradiction would be 
good enough to guarantee existence (and this is the path that mathemati- 
cians have overwhelmingly chosen to take). Think back to the section on 
the Axiom of Choice in Chapter Ten. Here objects were claimed to exist 
which were impossible to actually construct. In many ways these debates 
had quieted down by the 1930s, in part due to Gédel’s work, but also in part 
due to the nature of the algorithms that were eventually being produced. 
By the late 1800s, the objects that were being supposedly constructed by 
algorithms were so cumbersome and time-consuming, that no human could 
ever compute them by hand. To most people, the pragmatic difference 
between an existence argument versus a computation that would take a 
human the life of the universe was too small to care about, especially if the 
existence proof had a clean feel. 

All of this changed with the advent of computers. Suddenly, calcula- 
tions that would take many lifetimes by hand could be easily completed in 
millionths of a second on a personal computer. Standard software pack- 
ages like Mathematica and Maple can outcompute the wildest dreams of 
a mathematician from just a short time ago. Computers, though, seem to 
have problems with existence proofs. The need for constructive arguments 
returned with force, but now came a real concern with the efficiency of the 
construction, or the complexity of the algorithm. The idea that certain 
constructions have an intrinsic complexity has increasingly become basic in 
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most branches of mathematics. 


16.1 Algorithms and Complexity 


An accurate, specific definition for an algorithm is non-trivial and not very 
enlightening. As stated in the beginning of Cormen, Leiserson and Rivest’s 
book Introduction to Algorithms [22], 


Informally, an algorithm is any well-defined computational procedure that 
takes some value, or set of values, as input and produces some values, or 
set of values, as output. An algorithm is thus a sequence of computational 
steps that transform the input into the output. 


Much of what has been discussed in this book can be recast into the 
language of algorithms. Certainly, much of the first chapter on linear alge- 
bra, such as the definition of the determinant and Gaussian elimination, is 
fundamentally algorithmic in nature. 

We are concerned with the efficiency of an algorithm. Here we need to 
be concerned with asymptotic bounds on the growth of functions. 


Definition 16.1.1 Let f(x) and g(x) be two one-variable real-valued func- 
tions. We say that f(x) is in O(g(x)) if there exists a positive constant C 
and a positive number N so that for allxz > N, we have |f(x)| < Clg(z)]. 


This is informally known as big O notation. 

Typically we do not use the symbol “x” for our variable but “n”. Then 
the class of functions in O(n) will be those that grow at most linearly, those 
in O(n”) grow at most quadratically, etc. Thus the polynomial 3n4+7n—19 
is in O(n‘). 

For an algorithm there is the input size, n, which is how much informa- 
tion needs to be initially given, and the running time, which is how long 
the algorithm takes as a function of the input size. An algorithm is linear 
if the running time r(n) is in O(n), polynomial if the running time r(n) is 
in O(n*) for some integer k, etc. 

There are further concerns, such as the space size of an algorithm, which 
is how much space the algorithm requires in order to run as a function of 
the input size. 


16.2. Graphs: Euler and Hamiltonian Circuits 


An analysis of most current algorithms frequently comes down to study- 
ing graphs. This section will define graphs and then discuss graphs that 
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have Euler circuits and Hamiltonian circuits. We will see that while these 
two have similar looking definitions, their algorithmic properties are quite 
different. 

Intuitively a graph looks like: 


IMMA *ZDBDY A 


The key is that a graph consists of vertices and edges between vertices. 
All that matters is which vertices are linked by edges. Thus we will want 
these two graphs, which have different pictures in the plane, to be viewed 
as equivalent. 


N 


Definition 16.2.1 A graph G consists of a set V(g), called vertices, and 
a set E(G), called edges, and a function 


0: E(G) > {{u,v} : u,v € V(G)}. 


We say that elements 4, and v; in V(G) are connected by an edge e if 
a(e) = {vi, vj}. 


Note that {v;,v;} denotes the set consisting of the two vertices vj and vz. 
For the graph G: 


V4 
ey €3 
V2 Co V3 
we have 
V(G) = {v1, ve, v3} 
E(G) = {e1, €2, es} 
and 


o(e1) = {v1, v2}, 0(e2) = {v2, v3}, o(e3) = {v1, v3}. 


Associated to a graph is its adjacency matrix A(G). If there are n 
vertices, this will be the following n x n matrix. List the vertices: 
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V(G) = {v1, v2, .., Un}. 


For the (i, 7)-entry of the matrix, put in a k if there are k edges between vi 
and v; and a 0 otherwise. Thus the adjacency matrix for: 


vi aT: 


Va cy 
will be the 4 x 4 matrix: 


A(G) = 


OOo nN © 
me Ob 
Re Or © 
eee © 


The ‘1’ in the (4, 4) entry reflects that there is an edge from v4 to itself and 
the ‘2’ in the (1,2) and (2, 1) entries reflects that there are two edges from 
V1 tO U9. 

A path in a graph G is a sequence of edges that link up with each other. 
A circuit is a path that starts and ends at the same vertex. For example, 
in the graph: 


V5 


the path ege7 starts at vertex v, and ends at v4 while e;e2e3e4es is a circuit 
starting and ending at v1. 

We can now start to talk about Euler circuits. We will follow the tradi- 
tional approach and look first of the Königsberg bridge problem. The town 
of Königsberg had the following arrangement: 
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Here A, B, C and D denote land. 

The story goes that in the 1700s, the people of Königsberg would try 
to see if they could cross every bridge exactly once so that at the end they 
returned to their starting spot. Euler translated this game into a graph 
theory question. To each connected piece of land he assigned a vertex and 
to each bridge between pieces of land he assigned an edge. Thus Königsberg 
became the graph 


D 


Then the game will be solved if in this graph there is a circuit that contains 
each edge exactly once. Such circuits have a special name, in honor of 
Euler: 


Definition 16.2.2 An Euler circuit on a graph is a circuit that contains 
each edge exactly once. 


To solve the Königsberg bridge problem, Euler came up with a clean crite- 
rion for when any graph will have an Euler circuit. 


Theorem 16.2.1 A graph has an Euler circuit if and only if each vertex 
has an even number of edges coming into it. 
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Thus in Königsberg, since vertex A is on three edges (and in this case every 
other vertex also has an odd number of edges), no one can cross each bridge 
just once. 

The fact that each vertex must be on an even number of edges is not 
that hard to see. Suppose we have an Euler circuit. Imagine deleting each 
edge as we transverse the graph. Each time we enter, then leave, a vertex, 
two edges are deleted, reducing the number of edges containing that vertex 
by two. By the end, there are no edges left, meaning that the original 
number of edges at each vertex had to be even. 

The reverse direction is a bit more complicated but is more important. 
The best method (which we will not do) is to actually construct an algo- 
rithm that produces an Euler circuit. For us, the important point is that 
there is a clean, easy criterion for determining when an Euler circuit exists. 

Let us now make a seemingly minor change in the definition for an Euler 
circuit. Instead of finding a circuit that contains each edge only once, now 
let us try to find one that contains each vertex only once. These circuits 
are called: 


Definition 16.2.3 A graph has a Hamiltonian circuit if there is a circuit 
that contains each vertex exactly once. 


For example, for the graph: 
€ 
e4 ez 


€3 


the circuit e,e2¢3e4 is Hamiltonian, while for the graph: 


ee 


there is no Hamiltonian circuit. In this last graph, one can simply list all 
possible circuits and then just check if one of them is Hamiltonian. This 
algorithm of just listing all possible circuits will work for any graph, as 
there can only be a finite number of circuits, but this listing unfortunately 
takes O(n!) time, where n is the number of edges. For any graph with a fair 
number of edges, this approach is prohibitively time-consuming. But this 
is fairly close to the best known method for determining if a Hamiltonian 
circuit exists. As we will see in section four, the problem of finding a 
Hamiltonian circuit seems to be intrinsically difficult and important. 
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16.3. Sorting and Trees 


Suppose you are given a set of real numbers. Frequently you want to 
order the set from smallest number to largest. Similarly, suppose a stack 
of exams is sitting on your desk. You might want to put the exams into 
alphabetical order. Both of these problems are sorting problems. A sorting 
algorithm will take a collection of elements for which an ordering can exist 
and actually produce the ordering. This section will discuss how this is 
related to a special class of graphs called trees and that the lower bound 
for any sorting algorithm is O(n log(n)). 

Technically a tree is any graph that is connected (meaning that there 
is a path from any vertex to any other vertex) and contains within it no 
circuits. Thus 


A R É 
m 


are not. Those vertices contained on exactly one edge are called leaves. 
These are in some sense the vertices where the tree stops. We will be 
concerned with binary trees, which are constructed as follows. Start with a 
vertex called the root. Let two edges come out from the root. From each of 
the two new vertices at the end of the two edges, either let two new edges 
stem out or stop. Continue this process a finite number of steps. Such a 
tree looks like: 
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where vı is the root and v4, v5, U7, V9, V10, V12 and vıg are the leaves. We 
will draw our binary trees top down, with the root at the top and the leaves 
at the bottom. At each vertex, the two edges that stem down are called 
the left edge and right edge, respectively. The two vertices at the ends of 
these edges are called the left child and the right child, respectively. The 
height of a tree is the number of edges in the longest path from the root to 
a leaf. Thus the height of 


is three while the height of 


is six. 
We now want to see why sorting is linked to binary trees. We are given 
a collection of elements {a1,...,@n}. We will assume that all we can do is 


compare the size of any two elements. Thus given, say, elements a; and aj, 
we can determine if a; < aj or if aj < a;. Any such sorting algorithm can 
only, at each stage, take two a; and a; and, based on which is larger, tell 
us what to do at the next stage. Now to show that any such algorithm can 
be represented as a tree. The root will correspond to the first pair to be 
compared in the algorithm. Say this first pair is a; and aj. There are two 
possibilities for the order of a; and aj. If a; < aj, go down the left edge 
and if aj < a;, go down the right edge. An algorithm will tell us at this 
stage which pair of elements to now compare. Label the new vertices by 
these pairs. Continue this process until there is nothing left to compare. 
Thus we will have a tree, with each vertex labeled by a pair of elements in 
our set and each leaf corresponding to an ordering of the set. 

For example, take a three element set {a1,a2,a3}. Consider the fol- 
lowing simple algorithm (if anything this easy deserves to be called an 
algorithm): 

Compare a, and ag. If ay < a2, compare az and a3. If az < a3, then the 
ordering is a1 < ag < a3. If a3 < a2, compare a; and a3. If a, < az, 
then the ordering is a1 < a3 < ag. If we had a3 < aj, then the ordering 
is ag < a, < a2. Now we go back to the case when az < a;. Then we 
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next compare a, and a3. If a, < ag, the ordering is ag < a, < a3. If 
we have a3 < a1, we compare ag and ag. If ag < a3, then the ordering is 
a2 < ag < a. If ag < ag, then the ordering is a3 < az < a, and we are 
done. Even for this simple example, the steps, presented in this manner, 
are confusing. But when this method is represented as a tree it becomes 
clear: 


a2<a3 





a3< a2 
aj<ao<a3 








a3< a4 a2<a3<a1 a3<a2<ay 


acaz <a as<aji<ae 


We now want to show that for a binary tree there is an intrinsic lower 
bound on its height, which means that there is an intrinsic lower bound on 
the time needed to sort. 


Theorem 16.3.1 A binary tree of height n has at most 2” leaves. 


Proof: By induction. Suppose the height is zero. This means that the tree 
is a single vertex and thus has 2° = 1 leaf, which of course in this case is 
also the root and is easy to sort. 

Now suppose that we know the theorem is true for any tree of height 
n—1. Look at a tree of height n. Thus there is at least one path from 
the root to a leaf with length n. Remove all leaves, and their attaching 
edges, that are of length n from the root. We have a new tree of height 
n—1. The induction hypothesis kicks in, so we know that for this new tree 
there are at most 2”—! leaves. Let two edges stem out from each of these 
2-1 leaves, forming still another new tree which has height n and which 
contains our original tree. But we are adding two new vertices for each of 
the 2”—! leaves of the tree of height n — 1. Thus this final new tree has at 
most 2-2"-! = 2” leaves. Since each leaf of our original tree is a leaf of 
this tree, we have our result. 0 

This allows us to finally see that any algorithm that sorts n objects 
must be in at least O(n log(n)). 


Theorem 16.3.2 Any sorting algorithm based on pairwise comparisons 
must be in at least O(n log(n)). 


Proof: Given a set of n elements, there are n! different ways they can be 
initially ordered. For any sorting algorithm, for the corresponding tree there 
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must be a way, starting with the root, to get to one of these n! different 
initial orderings. Thus the tree must have at least n! leaves. Thus from the 
previous theorem, the tree must have height at least h, where 


2h > nl. 
Thus we must have 
h > log,(n!). 


Any sorting algorithm must take at least h steps and hence must be in at 
least O(log, (n!)). Now we have, for any number K, log(A‘) = log(2) log. (K), 
where of course, log is here the natural log, loge. Further, by Stirling’s for- 
mula, we have for large n that 


nw V2ann"e™”. 
Then 
log(n!) ~ log(V27n) + nlog(n) — nlog(e), 


which gives us that 


O(log(n!)) O(log(V2rn) + nlog(n) — nlog(e)) 


O(nlog(n)), 


since nlog(n) dominates the other terms. Thus the complexity of any 
sorting algorithm is in at least O(log,(n!), which equals O(nlog(n)), as 
desired. O 

To show that sorting is actually equal to O(nlog(n)), we would need 
to find an algorithm that runs in O(nlog(n)). Heapsort, merge and other 
algorithms for sorting do exist that are in O(nlog(n)). 


16.4 P=NP? 


The goal of this section is to discuss what is possibly the most important 
open problem in mathematics: “P=NP?”. This problem focuses on trying 
to determine the difference between the finding of a solution for a problem 
and the checking of a candidate solution for the problem. The fact that it 
remains open (and that it could well be independent of the other axioms 
of mathematics) shows that mathematicians do not yet understand the full 
meaning of mathematical existence versus construction. 

A problem is in polynomial time if, given input size n, there is an 
algorithm that is in O(n*), for some positive integer k. A problem is in NP 
if, given input size n, a candidate solution can be checked for accuracy in 
polynomial time. The N in the NP is somewhat of a joke; NP stands for 
“not polynomial”. 
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Think of a jigsaw puzzle. While it can be quite time consuming to put 
a jigsaw puzzle together, it is easy and quick to tell if someone has finished 
such a puzzle. For a more mathematical example, try to invert an n x n 
matrix A. While doable, it is not particularly easy to actually construct 
A-1. But if someone hands us a matrix B and claims that it is the inverse, 
all we have to do to check is to multiply out AB and see if we get the 
identity J. For another example, start with a graph G. It is difficult to 
determine if G contains a Hamiltonian circuit. But if someone hands us a 
candidate circuit, it is easy to check whether or not the circuit goes through 
every vertex exactly once. Certainly it appears that the problem of finding 
a solution should be intrinsically more difficult than the problem of checking 
the accuracy of a solution. 

Amazingly enough, people do not know if the class of NP problems is 
larger than the class of polynomial time problems (which are denoted as P 
problems). “P=NP” is the question: 

Is the class of problems in P equal to the class of problems in NP? 

This has been open for many years. While initially the smart money 
was on PANP, today the belief is increasingly that statement ‘P=NP’ is 
independent of the other axioms of mathematics. Few believe that P=NP. 

Even more intriguing is the existence of NP complete problems. Such a 
problem is not only in NP but also must be a yes/no question and, most 
importantly, every other NP problem must be capable of being translated 
into this problem in polynomial time. Thus if there is a polynomial time 
solution to this NP yes/no problem, there will be a polynomial time solution 
of every NP problem. 

Every area of math seems to have its own NP complete problems. For 
example, the question of whether or not a graph contains a Hamiltonian cir- 
cuit is a quintessential NP complete problem and, since it can be explained 
with little high level math, is a popular choice in expository works. 


16.5 Numerical Analysis: Newton’s Method 


Since the discovery of calculus, there has been work on finding answers to 
math questions that people can actually use. Frequently this comes down 
to only finding approximate solutions. Numerical Analysis is the field that 
tries to find approximate solutions to exact problems. How good of an 
approximation is good enough and how quickly the approximation can be 
found are the basic questions for a numerical analyst. While the roots of 
this subject are centuries old, the rise of computers has revolutionized the 
field. An algorithm that is unreasonable to perform by hand can often 
be easily solved for a standard computer. Since numerical analysis is ulti- 
mately concerned with the efficiency of algorithms, I have put this section 


318 CHAPTER 16. ALGORITHMS 


in this chapter. It must be noted that in the current math world, numer- 
ical analysts and people in complexity theory are not viewed as being in 
the same subdiscipline. This is not to imply that they don’t talk to each 
other; more that complexity theory has evolved from computer science and 
numerical analysis has always been a part of mathematics. 

There are certain touchstone problems in numerical analysis, problems 
that are returned to again and again. Certainly efficient algorithms for 
computations in linear algebra are always important. Another, which we 
will be concerned with here, is the problem of finding zeros of functions. 
Many problems in math can be recast into finding a zero of a function. 
We will first look at Newton’s method for approximating a zero of a real 
valued differentiable function f : R —> R, and then quickly see how the 
ideas behind this method can be used, at times, to approximate the zeros 
of other types of functions. 

Let f : R — R be a differentiable function. We will first Gitlin the 
geometry behind Newton’s method. Suppose we know its graph (which of 
course in real life we will rarely know; otherwise the problem of approxi- 
mating zeros would be easy) to be: 


y=f(x) 


We thus want to approximate the point 29. Choose any point xı. Draw 


the tangent line to the curve y = f(x) at the point (x1, f(x1)) and label its 
intersection with the x-axis by (x2, 0). 





slope = f (x1) 
(X1,6(X1)) 







Then we have 


0 — f(21) 


T2 — Tı , 


f'(a1) = 
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which, solving for zz, yields 


f(z) 
Fa) 


In the picture, it looks like our newly constructed x2 is closer to our desired 
Zo than is zı. Let us try the same thing but replacing the x,’s with re. 
We label x3 as the x-coordinate of the point of intersection of the tangent 
line of y = f(x) through the point (2, f(z2)) and get: 


_ (z2) 
2 Fæ) 


Again, it at least looks like x3 is getting closer to zo. Newton’s method is 
to continue this process, namely to set 





tg = Ly — 





3 = 2 


f(x) 


epee ne: f(z) 





For this to work, we need x, — 2. There are difficulties. Consider the 
picture: 





With this choice of initial x,, the x, will certainly not approach the zero 
Zo, though they do appear to approach a different zero. The problem of 
course is that this choice of x; is near a local maximum, which means that 
the derivative f'(x) is very small, forcing z = xı — f(x,)/f'(x1) to be far 
from Zo. 

We will now make this technically correct. Here we will see many ideas 
from calculus playing a critical role in proving that Newton’s method will, 
subject to specific conditions, always produce an approximation to the true 
zero. We will look at functions f : [a,b] —> [a,b] which have continuous 
second derivatives, i.e., functions in the vector space C?{a, b]. As an aside, 
we will be using throughout the Mean Value Theorem, which states that 
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for any function f € C?[a,b], there exists a number c with a < c < b such 


ae f (6) — f(a) 
I = a 


Our goal is: 


Theorem 16.5.1 Let f € C?[a,b]. Suppose there exists a point zo € [a, b] 
with f(xo) = 0 but f'(xo) 4 0. Then there exists a 6 > 0 such that, given 
any point zı in [£o — ô, £o + ô], if for all k we define 


f(2k-1) 
f'(@e—-1)’ 


Tk = LE-1 — 


we have that x, > Zo. 


This theorem states that Newton’s method will produce an approximation 
of the zero provided our initial choice x; is close enough to the zero. 
Proof: We will alter the problem from finding a zero of a function f to the 
finding of a fixed point of a function g. Set 


_ fle) 
gena FG): 





Note that f (zo) = 0 if and only if g(ap) = xo. We will show that Newton’s 
method will produce an approximation to a fixed point of g. 

We first need to see how to choose our 6 > 0. By taking derivatives and 
doing a bit of algebra, we have 


aa JOO 
DSTT 


Since the second derivative of f is still a continuous function, we have that 
g'(x) is a continuous function. Further, since f(zo) = 0, we have that 
g'(to) = 0. By continuity, given any positive number a, there exists a 
6 > 0 such that for all x € [xo — ô, £o + ô], we have 


Ig'(z)| <a. 


We choose a to be strictly less than one (the reason for this restriction will 
be clear in a moment). 
We will reduce the problem to proving the following three lemmas: 


Lemma 16.5.1 Let g : [a,b] - [a,b] be any continuous function. Then 
there is a fized point in [a,b]. 
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Lemma 16.5.2 Let g : [a,b] — [a,b] be any differentiable function such 
that for all x € [a,b] we have 


ig (s)| <a <1 
for some constant a. Then there is a unique fired point in the interval [a, b}. 


Lemma 16.5.3 Let g : [a,b] — [a,b] be any differentiable function such 
that for all x € [a,b] we have 


lg'(x)|<a<l 
for some constant a. Then given any xı E [a,b], if we set 


Tk+1 = 9(2k), 
then the x, will approach the fired point of g. 


Assume briefly that all three lemmas are true. Note by our choice of 
ô, we have the function g(x) = x — fa satisfying each of the conditions 
in the above lemma. Further we know that the zero zo of the function 
f(a) is the fixed point of g(x). Then we know that iterating any point in 
[£o — ô, £o + ô] by g, we will approach zo. But writing out this iteration is 
precisely Newton’s method. 

Now to prove the lemmas. 
Proof of first lemma: This will be a simple application of the Intermedi- 
ate Value Theorem. If g(a) = a or if g(b) = b, then a or b is our fixed point 
and we are done. Suppose neither holds. Since the range of g is contained 
in the interval [a,b], this means that 


a < g(a) and b > g(b). 


Set 
h(x) = xz — g(x). 
This new function is continuous and has the property that 


h(a) =a—g(a) <0 


and 
h(b) = b—g(b) > 0. 


By the Intermediate Value Theorem, there must be a c € [a,b] with 
h(c) =e—g(e) =0 


giving us our fixed point. O 
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Proof of second lemma: We will now use the Mean Value Theorem. 
Suppose there are two distinct fixed points, cı and c2. Label these points 
so that cı < co. By the Mean Value Theorem, there is some number c with 
cı <c < c such that 
g(c2) a g(cr) = Ke) 
coc 9 \C)- 


Since g(c1) = cı and g(c2) = c2, we have 


C2 — ĈI 
C2 — C1 


g(d = =i. 


Here is our contradiction, as we assumed that at all points that the absolute 
value of the derivative was strictly less than one. There cannot be two fixed 
points. O 
Proof of third lemma: This will be another application of the Mean 
Value Theorem. By the second lemma, we know that g has a unique fixed 
point. Call this fixed point zo. We will regularly replace zo by g(zo). 

Our goal is to show that |x, — zo] — 0. We will show that for all k 


|Ek — zo| < a|£k-ı — zol. 
Then by shifting subscripts we will have 
|£k-1 — zo| < aļzk-2 — zol, 
which will mean that 
|Ek — zo] < al£k-ı — zol < @?|zk-2 — zol < ... < af |e — zol. 


Since a is strictly less than one, we will have |r, — zo| — 0. 
Now 
[£k — zo| = |g(we-1) — 9(Zo)|- 
By the Mean Value Theorem, there is some point c between xp and xp-1 
with 
lera) = oleo) _ 4 
Tre-1 — LO 


which is equivalent to 


g(Er-1) — g(z0) = g'(c)(£k-1 — T0). 


Then 
lg(zr-1) — g(z0)| = lg'(c)||zz-1 — zol. 


Now we just have to observe that by assumption |g’(c)] < a, and we are 
done. 0 
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All this theorem is telling us is that if we start with an initial point close 
enough to the zero of a function, Newton’s method will indeed converge to 
the zero. It does not tell us how to make our initial choice and does not 
tell us the speed of the convergence. 

Now let us see how to try to use Newton’s method in other contexts. 
Suppose we have a map L: V — W from one vector space to another. How 
can we approximate a zero of this map? Let us assume that there is some 
notion of a derivative for the map L, which we will denote by DL. Then 
just formally following the Newton’s method, we might, starting with any 
element vı € V, recursively define 


Uk+1 = Uk — DL(vp)~* L(ve) 


and hope that the v; will approach the zero of the map. This could be at 
least an outline of a general approach. The difficulties are in understanding 
DL and in particular in dealing with when DL has some type of inverse. 

For example, consider a function F : R? > R?, given in local coordi- 
nates by 


F(x,y) = (filz,y), f(z, y))- 


The derivative of F should be the two-by-two Jacobian matrix 
z 
Starting with any (x1,y1) € R?, we set 


Tki) _ [2k \_ pp- _{ filer, Ye) 
Gay T (2) DF (ese) Cia ' 
Newton’s method will work if the (£k, yk) approach a zero of F. By 
placing appropriate restrictions on the zero of F', such as requiring that 
det(DEF (zo, yo)) Æ 0, we can find an analogous proof to the one-dimensional 
case. In fact, it generalizes to any finite dimension. 

More difficult problems occur for infinite dimensional spaces V and W. 
These naturally show up in the study of differential equations. People 
still try to follow a Newton-type method, but now the difficulty of dealing 
with the right notion for DL becomes a major stumbling block. This is 
why in trying to solve differential equations you are led to the study of 
infinite dimensional linear maps and are concerned with the behavior of the 
eigenvalues, since you want to control and understand what happens when 
the eigenvalues are, or are close to, zero, for this is the key to controlling 
the inverse of DL. The study of such eigenvalue questions falls under the 
rubric of Spectral Theorems, which is why the Spectral Theorem is a major 
part of beginning Functional Analysis and a major tool in PDE theory. 
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16.6 Books 


The basic text for algorithms is Introduction to Algorithms by Cormen, Leis- 
erson and Rivest [22]. Another source is Data Structures and Algorithms 
by Aho, Hopcroft, and Ullman [2]. 

Numerical Analysis has a long history. Further, many people, with 
widely varying mathematical backgrounds, need to learn some numerical 
analysis. Thus there are many beginning texts (though it must be stated 
that my knowledge of these texts is limited). 

Atkinson’s Introduction to Numerical Analysis [5] comes highly recom- 
mended. Another basic text that has long been the main reference for 
people studying for the numerical methods part of the actuarial exams is 
Numerical Methods by Burden and Faires [15]. Trefethon and Bau’s text 
[112] is a good source for numerical methods for linear algebra. For numeri- 
cal methods for differential equations, good sources are the books by Iserles 
[66] and Strikwerda [110]. Finally, for links with optimization theory, there 
is Ciarlet’s Introduction to Numerical Linear Algebra and Optimization [19]. 


16.7 Exercises 


1. Show that there are infinitely many nonisomorphic graphs, each having 
exactly & vertices. 

2. How many nonisomorphic graphs with exactly three vertices and four 
edges are there? 

3. Assume that the time for multiplying and adding two numbers together 
is exactly one. 

a. Find an algorithm that runs in time (n-1) that adds n numbers 
together. 

b. Find an algorithm that computes the dot product of two vectors in 
R? in time (2n-1). 

c. Assume that we can work in parallel, meaning that we allow algo- 
rithms that can compute items that do not depend on each other simulta- 
neously. Show that we can add n numbers together in time logy(n — 1). 

d. Find an algorithm that computes the dot product of two vectors in 
R? in parallel in time log,(n). 

4. Let A be the adjacency matrix for a graph G. 

a. Show that there is a nonzero (i,j) entry of the matrix A? if and only 
if there is a path containing two edges from vertex i to vertex j. 

b. Generalize part (a) to linking entries in the matrix A* to the existence 
of paths between various vertices having exactly k edges. 
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c. Find an algorithm that determines whether or not a given graph is 
connected. 
5. Use Newton’s method, with a calculator, to approximate /2 by approx- 
imating a root of the polynomial x? — 2. 
6. Let f : R” - R” be any differentiable function from R” to itself. Let 
xo be a point in R” with f(x) = 0 but with det(Df(zo)) 4 0, where Df 
denotes the Jacobian of the function f. Find a function g: R” — R” that 
has the point xo as a fixed point. 


Appendix A 


Equivalence Relations 


Throughout this text we have used equivalence relations. Here we collect 
some of the basic facts about equivalence relations. In essence, an equiva- 
lence relation is a generalization of equality. 


Definition A.0.1 (Equivalence Relation) An equivalence relation on a 
set X is any relation ‘x ~ y’ for x,y E€ X such that 


1. (Reflexivity) For any z € X, we have z ~ z. 
2. (Symmetry) For all x,y E X,if x ~y then y~z. 
3. (Transitivity) For all x,y,z E X, ifxz ~y andy ~ z, then xt ~z. 


The basic example is that of equality. Another example would be when 
X = R and we say that x ~ y if x — y is an integer. On the other hand, the 
relation x ~ y if x < y is not an equivalence relation, as it is not symmetric. 

We can also define equivalence relations in term of subsets of the ordered 
pairs X x X as follows: 


Definition A.0.2 (Equivalence Relation) An equivalence relation on a 
set X is a subset RC X x X such that 


1. (Reflexivity) For any x € X, we have (a,x) € R. 
2. (Symmetry) For all x,y € X, if (x,y) E€ R then (y, x) E€ R. 


3. (Transitivity) For all x,y,z € X, if (x,y) € R and (y,z) € R, the 


(z, zZ) ER. 


The link between the two definitions is of course that x ~ y means the 
same as (x,y) E R. 

An equivalence relation will split the set X into disjoint subsets, the 
equivalence classes. 
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Definition A.0.3 (Equivalence Classes) An equivalence class C is a 
subset of X such that if x,y E€ C, then x ~ y and if x €C andifz ~y, 
then y € C. 


The various equivalence classes are disjoint, a fact that follows from tran- 
sitivity. 

Exercises: 1. Let G be a group and H a subgroup. Define, for x,y € G, 
x ~ y, whenever zy~! € H. Show that this forms an equivalence relation 
on the group G. 

2. For any two sets A and B, define A ~ B if there is a one-to-one, onto 
map from A to B. Show that this is an equivalence relation. 

3. Let (v1, v2, 03) and (wi, we, w3) be two collections of three vectors in R°. 
Define (v1, v2, 03) ~ (w1, We, w3) if there is an element A € GL(n,R) such 
that Av; = w 1, Ave = we and Av3 = ws. Show that this is an equivalence 
relation. 

4. On the real numbers, say that x ~ y if x — y is a rational number. 
Show that this forms an equivalence relation on the real numbers. (This 
equivalence was used in Chapter Ten, in the proof that there exists non- 
measurable sets.) 
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